[committed] libstdc++: Improve std::lock algorithm

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

* [committed] libstdc++: Improve std::lock algorithm
@ 2021-06-21 17:31 Jonathan Wakely
  2021-06-22  9:07 ` Matthias Kretz
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-21 17:31 UTC (permalink / raw)
  To: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2863 bytes --]

The current std::lock algorithm is the one called "persistent" in Howard
Hinnant's https://howardhinnant.github.io/dining_philosophers.html post.
While it tends to perform acceptably fast, it wastes a lot of CPU cycles
by continuously locking and unlocking the uncontended mutexes.
Effectively, it's a spin lock with no back-off.

This replaces it with the one Howard calls "smart and polite". It's
smart, because when a Mi.try_lock() call fails because mutex Mi is
contended, the algorithm reorders the mutexes until Mi is first, then
calls Mi.lock(), to block until Mi is no longer contended.  It's
polite because it uses std::this_thread::yield() between the failed
Mi.try_lock() call and the Mi.lock() call. (In reality it uses
__gthread_yield() directly, because using this_thread::yield() would
require shuffling code around to avoid a circular dependency.)

This version of the algorithm is inspired by some hints from Howard, so
that it has strictly bounded stack usage. As the comment in the code
says:

// This function can recurse up to N levels deep, for N = 1+sizeof...(L1).
// On each recursion the lockables are rotated left one position,
// e.g. depth 0: l0, l1, l2; depth 1: l1, l2, l0; depth 2: l2, l0, l1.
// When a call to l_i.try_lock() fails it recurses/returns to depth=i
// so that l_i is the first argument, and then blocks until l_i is locked.

The 'i' parameter is the desired permuation of the lockables, and the
'depth' parameter is the depth in the call stack of the current
instantiation of the function template. If i == depth then the function
calls l0.lock() and then l1.try_lock()... for each lockable in the
parameter pack l1.  If i > depth then the function rotates the lockables
to the left one place, and calls itself again to go one level deeper.
Finally, if i < depth then the function returns to a shallower depth,
equivalent to a right rotate of the lockables.  When a call to
try_lock() fails, i is set to the index of the contended lockable, so
that the next call to l0.lock() will use the contended lockable as l0.

This commit also replaces the std::try_lock implementation details. The
new code is identical in behaviour, but uses a pair of constrained
function templates. This avoids instantiating a class template, and is a
litle simpler to call where used in std::__detail::__lock_impl and
std::try_lock.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/std/mutex (__try_to_lock): Move to __detail namespace.
	(struct __try_lock_impl): Replace with ...
	(__detail::__try_lock_impl<Idx>(tuple<Lockables...>&)): New
	function templates to implement std::try_lock.
	(try_lock): Use new __try_lock_impl.
	(__detail::__lock_impl(int, int&, L0&, L1&...)): New function
	template to implement std::lock.
	(lock): Use __lock_impl.

Tested powerpc64le-linux. Committed to trunk.


[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 8394 bytes --]

commit 6cf0040fff78a665db31a6a8dee60b12eef2e590
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Mon Jun 21 13:35:18 2021

    libstdc++: Improve std::lock algorithm
    
    The current std::lock algorithm is the one called "persistent" in Howard
    Hinnant's https://howardhinnant.github.io/dining_philosophers.html post.
    While it tends to perform acceptably fast, it wastes a lot of CPU cycles
    by continuously locking and unlocking the uncontended mutexes.
    Effectively, it's a spin lock with no back-off.
    
    This replaces it with the one Howard calls "smart and polite". It's
    smart, because when a Mi.try_lock() call fails because mutex Mi is
    contended, the algorithm reorders the mutexes until Mi is first, then
    calls Mi.lock(), to block until Mi is no longer contended.  It's
    polite because it uses std::this_thread::yield() between the failed
    Mi.try_lock() call and the Mi.lock() call. (In reality it uses
    __gthread_yield() directly, because using this_thread::yield() would
    require shuffling code around to avoid a circular dependency.)
    
    This version of the algorithm is inspired by some hints from Howard, so
    that it has strictly bounded stack usage. As the comment in the code
    says:
    
    // This function can recurse up to N levels deep, for N = 1+sizeof...(L1).
    // On each recursion the lockables are rotated left one position,
    // e.g. depth 0: l0, l1, l2; depth 1: l1, l2, l0; depth 2: l2, l0, l1.
    // When a call to l_i.try_lock() fails it recurses/returns to depth=i
    // so that l_i is the first argument, and then blocks until l_i is locked.
    
    The 'i' parameter is the desired permuation of the lockables, and the
    'depth' parameter is the depth in the call stack of the current
    instantiation of the function template. If i == depth then the function
    calls l0.lock() and then l1.try_lock()... for each lockable in the
    parameter pack l1.  If i > depth then the function rotates the lockables
    to the left one place, and calls itself again to go one level deeper.
    Finally, if i < depth then the function returns to a shallower depth,
    equivalent to a right rotate of the lockables.  When a call to
    try_lock() fails, i is set to the index of the contended lockable, so
    that the next call to l0.lock() will use the contended lockable as l0.
    
    This commit also replaces the std::try_lock implementation details. The
    new code is identical in behaviour, but uses a pair of constrained
    function templates. This avoids instantiating a class template, and is a
    litle simpler to call where used in std::__detail::__lock_impl and
    std::try_lock.
    
    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
    
    libstdc++-v3/ChangeLog:
    
            * include/std/mutex (__try_to_lock): Move to __detail namespace.
            (struct __try_lock_impl): Replace with ...
            (__detail::__try_lock_impl<Idx>(tuple<Lockables...>&)): New
            function templates to implement std::try_lock.
            (try_lock): Use new __try_lock_impl.
            (__detail::__lock_impl(int, int&, L0&, L1&...)): New function
            template to implement std::lock.
            (lock): Use __lock_impl.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index d4c5d13f654..5f2d8f9ee7b 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -512,47 +512,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // _GLIBCXX_HAS_GTHREADS
 
   /// @cond undocumented
-  template<typename _Lock>
-    inline unique_lock<_Lock>
-    __try_to_lock(_Lock& __l)
-    { return unique_lock<_Lock>{__l, try_to_lock}; }
+  namespace __detail
+  {
+    template<typename _Lockable>
+      inline unique_lock<_Lockable>
+      __try_to_lock(_Lockable& __l)
+      { return unique_lock<_Lockable>{__l, try_to_lock}; }
 
-  template<int _Idx, bool _Continue = true>
-    struct __try_lock_impl
-    {
-      template<typename... _Lock>
-	static void
-	__do_try_lock(tuple<_Lock&...>& __locks, int& __idx)
-	{
-          __idx = _Idx;
-          auto __lock = std::__try_to_lock(std::get<_Idx>(__locks));
-          if (__lock.owns_lock())
-            {
-	      constexpr bool __cont = _Idx + 2 < sizeof...(_Lock);
-	      using __try_locker = __try_lock_impl<_Idx + 1, __cont>;
-	      __try_locker::__do_try_lock(__locks, __idx);
-              if (__idx == -1)
-                __lock.release();
-            }
-	}
-    };
+    // Lock the last element of the tuple, after all previous ones are locked.
+    template<int _Idx, typename... _Lockables>
+      inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
+      __try_lock_impl(tuple<_Lockables&...>& __lockables)
+      {
+	if (auto __lock = __detail::__try_to_lock(std::get<_Idx>(__lockables)))
+	  {
+	    __lock.release();
+	    return -1;
+	  }
+	else
+	  return _Idx;
+      }
 
-  template<int _Idx>
-    struct __try_lock_impl<_Idx, false>
-    {
-      template<typename... _Lock>
-	static void
-	__do_try_lock(tuple<_Lock&...>& __locks, int& __idx)
-	{
-          __idx = _Idx;
-          auto __lock = std::__try_to_lock(std::get<_Idx>(__locks));
-          if (__lock.owns_lock())
-            {
-              __idx = -1;
-              __lock.release();
-            }
-	}
-    };
+    // Lock tuple elements starting from _Idx.
+    template<int _Idx, typename... _Lockables>
+      inline __enable_if_t<_Idx + 1 != sizeof...(_Lockables), int>
+      __try_lock_impl(tuple<_Lockables&...>& __lockables)
+      {
+	if (auto __lock = __detail::__try_to_lock(std::get<_Idx>(__lockables)))
+	  {
+	    int __idx = __detail::__try_lock_impl<_Idx + 1>(__lockables);
+	    if (__idx == -1)
+	      __lock.release();
+	    return __idx;
+	  }
+	else
+	  return _Idx;
+      }
+
+  } // namespace __detail
   /// @endcond
 
   /** @brief Generic try_lock.
@@ -569,12 +566,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     int
     try_lock(_Lock1& __l1, _Lock2& __l2, _Lock3&... __l3)
     {
-      int __idx;
-      auto __locks = std::tie(__l1, __l2, __l3...);
-      __try_lock_impl<0>::__do_try_lock(__locks, __idx);
-      return __idx;
+      auto __lockables = std::tie(__l1, __l2, __l3...);
+      return __detail::__try_lock_impl<0>(__lockables);
     }
 
+  /// @cond undocumented
+  namespace __detail
+  {
+    // This function can recurse up to N levels deep, for N = 1+sizeof...(L1).
+    // On each recursion the lockables are rotated left one position,
+    // e.g. depth 0: l0, l1, l2; depth 1: l1, l2, l0; depth 2: l2, l0, l1.
+    // When a call to l_i.try_lock() fails it recurses/returns to depth=i
+    // so that l_i is the first argument, and then blocks until l_i is locked.
+    template<typename _L0, typename... _L1>
+      void
+      __lock_impl(int& __i, int __depth, _L0& __l0, _L1&... __l1)
+      {
+	while (__i >= __depth)
+	  {
+	    if (__i == __depth)
+	      {
+		int __failed = 1; // index that couldn't be locked
+		{
+		  unique_lock<_L0> __first(__l0);
+		  auto __rest = std::tie(__l1...);
+		  __failed += __detail::__try_lock_impl<0>(__rest);
+		  if (!__failed)
+		    {
+		      __i = -1; // finished
+		      __first.release();
+		      return;
+		    }
+		}
+#ifdef _GLIBCXX_USE_SCHED_YIELD
+		__gthread_yield();
+#endif
+		constexpr auto __n = 1 + sizeof...(_L1);
+		__i = (__depth + __failed) % __n;
+	      }
+	    else // rotate left until l_i is first.
+	      __detail::__lock_impl(__i, __depth + 1, __l1..., __l0);
+	  }
+      }
+
+  } // namespace __detail
+  /// @endcond
+
   /** @brief Generic lock.
    *  @param __l1 Meets Lockable requirements (try_lock() may throw).
    *  @param __l2 Meets Lockable requirements (try_lock() may throw).
@@ -590,19 +627,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     void
     lock(_L1& __l1, _L2& __l2, _L3&... __l3)
     {
-      while (true)
-        {
-          using __try_locker = __try_lock_impl<0, sizeof...(_L3) != 0>;
-          unique_lock<_L1> __first(__l1);
-          int __idx;
-          auto __locks = std::tie(__l2, __l3...);
-          __try_locker::__do_try_lock(__locks, __idx);
-          if (__idx == -1)
-            {
-              __first.release();
-              return;
-            }
-        }
+      int __i = 0;
+      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
     }
 
 #if __cplusplus >= 201703L

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [committed] libstdc++: Improve std::lock algorithm
  2021-06-21 17:31 [committed] libstdc++: Improve std::lock algorithm Jonathan Wakely
@ 2021-06-22  9:07 ` Matthias Kretz
  2021-06-22 12:51   ` [PATCH v2] " Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Matthias Kretz @ 2021-06-22  9:07 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Jonathan Wakely

On Monday, 21 June 2021 19:31:59 CEST Jonathan Wakely via Gcc-patches wrote:
> diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
> index d4c5d13f654..5f2d8f9ee7b 100644
> --- a/libstdc++-v3/include/std/mutex
> +++ b/libstdc++-v3/include/std/mutex
> @@ -512,47 +512,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif // _GLIBCXX_HAS_GTHREADS
> 
>    /// @cond undocumented
> -  template<typename _Lock>
> -    inline unique_lock<_Lock>
> -    __try_to_lock(_Lock& __l)
> -    { return unique_lock<_Lock>{__l, try_to_lock}; }
> +  namespace __detail
> +  {
> +    template<typename _Lockable>
> +      inline unique_lock<_Lockable>
> +      __try_to_lock(_Lockable& __l)
> +      { return unique_lock<_Lockable>{__l, try_to_lock}; }
> 
> -  template<int _Idx, bool _Continue = true>
> -    struct __try_lock_impl
> -    {
> -      template<typename... _Lock>
> -       static void
> -       __do_try_lock(tuple<_Lock&...>& __locks, int& __idx)
> -       {
> -          __idx = _Idx;
> -          auto __lock = std::__try_to_lock(std::get<_Idx>(__locks));
> -          if (__lock.owns_lock())
> -            {
> -             constexpr bool __cont = _Idx + 2 < sizeof...(_Lock);
> -             using __try_locker = __try_lock_impl<_Idx + 1, __cont>;
> -             __try_locker::__do_try_lock(__locks, __idx);
> -              if (__idx == -1)
> -                __lock.release();
> -            }
> -       }
> -    };
> +    // Lock the last element of the tuple, after all previous ones are
> locked. +    template<int _Idx, typename... _Lockables>
> +      inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
> +      __try_lock_impl(tuple<_Lockables&...>& __lockables)

Couldn't you drop the need for enable_if and tuple if you define the function 
like this? (Or - without constexpr if - two overloads with 
__try_lock_impl(_L1& __l1) and __try_lock_impl(_L1& __l1, _L2& __l2, 
_Lockables&... __lockables)

    template<typename _L1, typename... _Lockables>
      inline int
      __try_lock_impl(_L1& __l1, _Lockables&... __lockables)
      {
	if (auto __lock = __detail::__try_to_lock(__l1))
	  {
	    if constexpr (sizeof...(_Lockables))
	      {
		int __idx = __detail::__try_lock_impl(__lockables...);
		if (__idx >= 0)
		  return __idx + 1;
	      }
	    __lock.release();
	    return -1;
	  }
	else
	  return 0;
      }

> [...]
> +    template<typename _L0, typename... _L1>
> +      void
> +      __lock_impl(int& __i, int __depth, _L0& __l0, _L1&... __l1)
> +      {

How about optimizing a likely common case where all lockables have the same 
type? In that case we don't require recursion and can manage stack usage much 
simpler:

  if constexpr ((is_same_v<_L0, _L1> && ...))
    {
      constexpr int _Np = 1 + sizeof...(_L1);
      std::array<unique_lock<_L0>, _Np> __locks = {
	{__l0, defer_lock}, {__l1, defer_lock}...
      };
      int __first = 0;
      do {
	__locks[__first].lock();
	for (int __j = 1; __j < _Np; ++__j)
	  {
	    const int __idx = (__first + __j) % _Np;
	    if (!__locks[__idx].try_lock())
	      {
		for (int __k = __idx; __k != __first;
		    __k = __k == 1 ? _Np : __k - 1)
		  __locks[__k - 1].unlock();
		__first = __idx;
		break;
	      }
	  }
      } while (!__locks[__first]);
      for (int __j = 0; __j < _Np; ++__j)
	__locks[__j].release();
    }
  else


> +       while (__i >= __depth)
> +         {
> +           if (__i == __depth)
> +             {
> +               int __failed = 1; // index that couldn't be locked
> +               {
> +                 unique_lock<_L0> __first(__l0);
> +                 auto __rest = std::tie(__l1...);
> +                 __failed += __detail::__try_lock_impl<0>(__rest);
> +                 if (!__failed)
> +                   {
> +                     __i = -1; // finished
> +                     __first.release();
> +                     return;
> +                   }
> +               }
> +#ifdef _GLIBCXX_USE_SCHED_YIELD
> +               __gthread_yield();
> +#endif
> +               constexpr auto __n = 1 + sizeof...(_L1);
> +               __i = (__depth + __failed) % __n;
> +             }
> +           else // rotate left until l_i is first.
> +             __detail::__lock_impl(__i, __depth + 1, __l1..., __l0);
> +         }
> +      }
> +
> +  } // namespace __detail
> +  /// @endcond
> +
>    /** @brief Generic lock.
>     *  @param __l1 Meets Lockable requirements (try_lock() may throw).
>     *  @param __l2 Meets Lockable requirements (try_lock() may throw).
> @@ -590,19 +627,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      void
>      lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>      {
> -      while (true)
> -        {
> -          using __try_locker = __try_lock_impl<0, sizeof...(_L3) != 0>;
> -          unique_lock<_L1> __first(__l1);
> -          int __idx;
> -          auto __locks = std::tie(__l2, __l3...);
> -          __try_locker::__do_try_lock(__locks, __idx);
> -          if (__idx == -1)
> -            {
> -              __first.release();
> -              return;
> -            }
> -        }
> +      int __i = 0;
> +      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
>      }
> 
>  #if __cplusplus >= 201703L


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] libstdc++: Improve std::lock algorithm
  2021-06-22  9:07 ` Matthias Kretz
@ 2021-06-22 12:51   ` Jonathan Wakely
  2021-06-22 13:21     ` Matthias Kretz
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-22 12:51 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: libstdc++, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 4084 bytes --]

On Tue, 22 Jun 2021 at 10:07, Matthias Kretz wrote:
>
> On Monday, 21 June 2021 19:31:59 CEST Jonathan Wakely via Gcc-patches wrote:
> > +    // Lock the last element of the tuple, after all previous ones are
> > locked. +    template<int _Idx, typename... _Lockables>
> > +      inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
> > +      __try_lock_impl(tuple<_Lockables&...>& __lockables)
>
> Couldn't you drop the need for enable_if and tuple if you define the function
> like this? (Or - without constexpr if - two overloads with
> __try_lock_impl(_L1& __l1) and __try_lock_impl(_L1& __l1, _L2& __l2,
> _Lockables&... __lockables)
>
>     template<typename _L1, typename... _Lockables>
>       inline int
>       __try_lock_impl(_L1& __l1, _Lockables&... __lockables)
>       {
>         if (auto __lock = __detail::__try_to_lock(__l1))
>           {
>             if constexpr (sizeof...(_Lockables))
>               {
>                 int __idx = __detail::__try_lock_impl(__lockables...);
>                 if (__idx >= 0)
>                   return __idx + 1;
>               }
>             __lock.release();
>             return -1;
>           }
>         else
>           return 0;
>       }

Yes, I did try something like that, but we can't use if-constexpr
unconditionally. Doing it with two overloads still needed enable_if or
tag dispatching, but that's because I retained the use of std::tuple,
so was passing around the whole parameter pack. With your suggestion
to also drop std::tuple the number of parameters decides which
function we call. And we don't instantiate std::tuple. And we can also
get rid of the __try_to_lock function, which was only used to deduce
the lock type rather than use tuple_element to get it. That's much
nicer.

>
> > [...]
> > +    template<typename _L0, typename... _L1>
> > +      void
> > +      __lock_impl(int& __i, int __depth, _L0& __l0, _L1&... __l1)
> > +      {
>
> How about optimizing a likely common case where all lockables have the same
> type? In that case we don't require recursion and can manage stack usage much
> simpler:

The stack usage is bounded by the number of mutexes being locked,
which is unlikely to get large, but we can do that.

We can do it for try_lock too:

  template<typename _L1, typename _L2, typename... _L3>
    int
    try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
    {
#if __cplusplus >= 201703L
      if constexpr (is_same_v<_L1, _L2>
                    && (is_same_v<_L1, _L3> && ...))
        {
          constexpr int _Np = 2 + sizeof...(_L3);
          unique_lock<_L1> __locks[_Np] = {
              {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3, try_to_lock}...
          };
          for (int __i = 0; __i < _Np; ++__i)
            if (!__locks[__i])
              return __i;
          for (auto& __l : __locks)
            __l.release();
          return -1;
        }
      else
#endif
      return __detail::__try_lock_impl(__l1, __l2, __l3...);
    }



>
>   if constexpr ((is_same_v<_L0, _L1> && ...))
>     {
>       constexpr int _Np = 1 + sizeof...(_L1);
>       std::array<unique_lock<_L0>, _Np> __locks = {
>         {__l0, defer_lock}, {__l1, defer_lock}...
>       };
>       int __first = 0;
>       do {
>         __locks[__first].lock();
>         for (int __j = 1; __j < _Np; ++__j)
>           {
>             const int __idx = (__first + __j) % _Np;
>             if (!__locks[__idx].try_lock())
>               {
>                 for (int __k = __idx; __k != __first;
>                     __k = __k == 1 ? _Np : __k - 1)
>                   __locks[__k - 1].unlock();

This loop doesn't work if any try_lock fails when first==0, because
the loop termination condition is never reached.

I find this a bit easier to understand than the loop above, and
correct (I think):

  for (int __k = __j; __k != 0; --__k)
    __locks[(__first + __k - 1) % _Np].unlock();

I'll finish testing the attached patch. I should probably add more
tests, so that each test is run for a set of lockables of the same
type, and also for lockables of different types.

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 6081 bytes --]

commit 7d7cf35ed3c4b9d8c7fc4a52b4c7b1788c85c46d
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Jun 22 13:35:19 2021

    libstdc++: Simplify std::try_lock and std::lock further
    
    The std::try_lock and std::lock algorithms can use iteration instead of
    recursion when all lockables have the same type and can be held by an
    array of unique_lock<L> objects.
    
    By making this change to __detail::__try_lock_impl it also benefits
    __detail::__lock_impl, which uses it. For std::lock we can just put the
    iterative version directly in std::lock, to avoid making any call to
    __detail::__lock_impl.
    
    Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
    
    Co-authored-by: Matthias Kretz <m.kretz@gsi.de>
    
    libstdc++-v3/ChangeLog:
    
            * include/std/mutex (lock): Replace recursion with iteration
            when lockables all have the same type.
            (__detail::__try_lock_impl): Likewise. Pass lockables as
            parameters, instead of a tuple. Always lock the first one, and
            recurse for the rest.
            (__detail::__lock_impl): Adjust call to __try_lock_impl.
            (__detail::__try_to_lock): Remove.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 5f2d8f9ee7b..d1ba0c95397 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -514,39 +514,53 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// @cond undocumented
   namespace __detail
   {
+    // Lock the last lockable, after all previous ones are locked.
     template<typename _Lockable>
-      inline unique_lock<_Lockable>
-      __try_to_lock(_Lockable& __l)
-      { return unique_lock<_Lockable>{__l, try_to_lock}; }
-
-    // Lock the last element of the tuple, after all previous ones are locked.
-    template<int _Idx, typename... _Lockables>
-      inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
-      __try_lock_impl(tuple<_Lockables&...>& __lockables)
+      inline int
+      __try_lock_impl(_Lockable& __lockable)
       {
-	if (auto __lock = __detail::__try_to_lock(std::get<_Idx>(__lockables)))
+	if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
 	  {
 	    __lock.release();
 	    return -1;
 	  }
 	else
-	  return _Idx;
+	  return 0;
       }
 
-    // Lock tuple elements starting from _Idx.
-    template<int _Idx, typename... _Lockables>
-      inline __enable_if_t<_Idx + 1 != sizeof...(_Lockables), int>
-      __try_lock_impl(tuple<_Lockables&...>& __lockables)
+    // Lock each lockable in turn.
+    template<typename _L0, typename... _Lockables>
+      inline int
+      __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
       {
-	if (auto __lock = __detail::__try_to_lock(std::get<_Idx>(__lockables)))
+#if __cplusplus >= 201703L
+	if constexpr ((is_same_v<_L0, _Lockables> && ...))
 	  {
-	    int __idx = __detail::__try_lock_impl<_Idx + 1>(__lockables);
-	    if (__idx == -1)
-	      __lock.release();
-	    return __idx;
+	    constexpr int _Np = 1 + sizeof...(_Lockables);
+	    unique_lock<_L0> __locks[_Np] = {
+		{__l0, try_to_lock}, {__lockables, try_to_lock}...
+	    };
+	    for (int __i = 0; __i < _Np; ++__i)
+	      if (!__locks[__i])
+		return __i;
+	    for (auto& __l : __locks)
+	      __l.release();
+	    return -1;
 	  }
 	else
-	  return _Idx;
+#endif
+	if (unique_lock<_L0> __lock{__l0, try_to_lock})
+	  {
+	    int __idx = __detail::__try_lock_impl(__lockables...);
+	    if (__idx == -1)
+	      {
+		__lock.release();
+		return -1;
+	      }
+	    return __idx + 1;
+	  }
+	else
+	  return 0;
       }
 
   } // namespace __detail
@@ -562,12 +576,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    *
    *  Sequentially calls try_lock() on each argument.
    */
-  template<typename _Lock1, typename _Lock2, typename... _Lock3>
+  template<typename _L1, typename _L2, typename... _L3>
     int
-    try_lock(_Lock1& __l1, _Lock2& __l2, _Lock3&... __l3)
+    try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
     {
-      auto __lockables = std::tie(__l1, __l2, __l3...);
-      return __detail::__try_lock_impl<0>(__lockables);
+      return __detail::__try_lock_impl(__l1, __l2, __l3...);
     }
 
   /// @cond undocumented
@@ -589,8 +602,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		int __failed = 1; // index that couldn't be locked
 		{
 		  unique_lock<_L0> __first(__l0);
-		  auto __rest = std::tie(__l1...);
-		  __failed += __detail::__try_lock_impl<0>(__rest);
+		  __failed += __detail::__try_lock_impl(__l1...);
 		  if (!__failed)
 		    {
 		      __i = -1; // finished
@@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    *  @post All arguments are locked.
    *
    *  All arguments are locked via a sequence of calls to lock(), try_lock()
-   *  and unlock().  If the call exits via an exception any locks that were
-   *  obtained will be released.
+   *  and unlock().  If this function exits via an exception any locks that
+   *  were obtained will be released.
    */
   template<typename _L1, typename _L2, typename... _L3>
     void
     lock(_L1& __l1, _L2& __l2, _L3&... __l3)
     {
-      int __i = 0;
-      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
+#if __cplusplus >= 201703L
+      if constexpr (is_same_v<_L1, _L2> && (is_same_v<_L1, _L3> && ...))
+	{
+	  constexpr int _Np = 2 + sizeof...(_L3);
+	  unique_lock<_L1> __locks[] = {
+	      {__l1, defer_lock}, {__l2, defer_lock}, {__l3, defer_lock}...
+	  };
+	  int __first = 0;
+	  do {
+	    __locks[__first].lock();
+	    for (int __j = 1; __j < _Np; ++__j)
+	      {
+		const int __idx = (__first + __j) % _Np;
+		if (!__locks[__idx].try_lock())
+		  {
+		    for (int __k = __j; __k != 0; --__k)
+		      __locks[(__first + __k - 1) % _Np].unlock();
+		    __first = __idx;
+		    break;
+		  }
+	      }
+	  } while (!__locks[__first]);
+
+	  for (auto& __l : __locks)
+	    __l.release();
+	}
+      else
+#endif
+	{
+	  int __i = 0;
+	  __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
+	}
     }
 
 #if __cplusplus >= 201703L

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] libstdc++: Improve std::lock algorithm
  2021-06-22 12:51   ` [PATCH v2] " Jonathan Wakely
@ 2021-06-22 13:21     ` Matthias Kretz
  2021-06-22 15:20       ` Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Matthias Kretz @ 2021-06-22 13:21 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc Patches

On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> With your suggestion
> to also drop std::tuple the number of parameters decides which
> function we call. And we don't instantiate std::tuple. And we can also
> get rid of the __try_to_lock function, which was only used to deduce
> the lock type rather than use tuple_element to get it. That's much
> nicer.

👍

> > How about optimizing a likely common case where all lockables have the
> > same
> > type? In that case we don't require recursion and can manage stack usage
> > much
> > simpler:
> The stack usage is bounded by the number of mutexes being locked,
> which is unlikely to get large, but we can do that.

Right. I meant simpler, because it takes a while of staring at the recursive 
implementation to understand how it works. :)

> We can do it for try_lock too:
> 
>   template<typename _L1, typename _L2, typename... _L3>
>     int
>     try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>     {
> #if __cplusplus >= 201703L
>       if constexpr (is_same_v<_L1, _L2>
>                     && (is_same_v<_L1, _L3> && ...))
>         {
>           constexpr int _Np = 2 + sizeof...(_L3);
>           unique_lock<_L1> __locks[_Np] = {
>               {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> try_to_lock}... };

This does a try_lock on all lockabes even if any of them fails. I think that's 
not only more expensive but also non-conforming. I think you need to defer 
locking and then loop from beginning to end to break the loop on the first 
unsuccessful try_lock.

>           for (int __i = 0; __i < _Np; ++__i)
>             if (!__locks[__i])
>               return __i;
>           for (auto& __l : __locks)
>             __l.release();
>           return -1;
>         }
>       else
> #endif
>       return __detail::__try_lock_impl(__l1, __l2, __l3...);
>     }
> 
> > if constexpr ((is_same_v<_L0, _L1> && ...))
> > {
> > constexpr int _Np = 1 + sizeof...(_L1);
> > std::array<unique_lock<_L0>, _Np> __locks = {
> > {__l0, defer_lock}, {__l1, defer_lock}...
> > };
> > int __first = 0;
> > do {
> > __locks[__first].lock();
> > for (int __j = 1; __j < _Np; ++__j)
> > {
> > const int __idx = (__first + __j) % _Np;
> > if (!__locks[__idx].try_lock())
> > {
> > for (int __k = __idx; __k != __first;
> > __k = __k == 1 ? _Np : __k - 1)
> > __locks[__k - 1].unlock();
> 
> This loop doesn't work if any try_lock fails when first==0, because
> the loop termination condition is never reached.

Uh yes. Which is the same reason why the __j loop doesn't start from __first + 
1.

> I find this a bit easier to understand than the loop above, and
> correct (I think):
> 
>   for (int __k = __j; __k != 0; --__k)
>     __locks[(__first + __k - 1) % _Np].unlock();

Yes, if only we had a wrapping integer type that wraps at an arbitrary N. Like 
unsigned int but with parameter, like:

  for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
    __locks[__k - 1].unlock();

This is the loop I wanted to write, except --__k is simpler to write and __k - 
1 would also wrap around to _Np - 1 for __k == 0. But if this is the only 
place it's not important enough to abstract.

> [...]
> @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     *  @post All arguments are locked.
>     *
>     *  All arguments are locked via a sequence of calls to lock(),
> try_lock() -   *  and unlock().  If the call exits via an exception any
> locks that were -   *  obtained will be released.
> +   *  and unlock().  If this function exits via an exception any locks that
> +   *  were obtained will be released.
>     */
>    template<typename _L1, typename _L2, typename... _L3>
>      void
>      lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>      {
> -      int __i = 0;
> -      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> +#if __cplusplus >= 201703L

I also considered moving it down here. Makes sense unless you want to call 
__detail::__lock_impl from other functions. And if we want to make it work for 
pre-C++11 we could do

  using __homogeneous
    = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
  int __i = 0;
  __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);


-Matthias

> +      if constexpr (is_same_v<_L1, _L2> && (is_same_v<_L1, _L3> && ...))
> +       {
> +         constexpr int _Np = 2 + sizeof...(_L3);
> +         unique_lock<_L1> __locks[] = {
> +             {__l1, defer_lock}, {__l2, defer_lock}, {__l3, defer_lock}...
> +         };
> +         int __first = 0;
> +         do {
> +           __locks[__first].lock();
> +           for (int __j = 1; __j < _Np; ++__j)
> +             {
> +               const int __idx = (__first + __j) % _Np;
> +               if (!__locks[__idx].try_lock())
> +                 {
> +                   for (int __k = __j; __k != 0; --__k)
> +                     __locks[(__first + __k - 1) % _Np].unlock();
> +                   __first = __idx;
> +                   break;
> +                 }
> +             }
> +         } while (!__locks[__first]);
> +
> +         for (auto& __l : __locks)
> +           __l.release();
> +       }
> +      else
> +#endif
> +       {
> +         int __i = 0;
> +         __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> +       }
>      }
> 
>  #if __cplusplus >= 201703L


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] libstdc++: Improve std::lock algorithm
  2021-06-22 13:21     ` Matthias Kretz
@ 2021-06-22 15:20       ` Jonathan Wakely
  2021-06-22 16:03         ` Matthias Kretz
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-22 15:20 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: libstdc++, gcc Patches

On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:

> On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> > With your suggestion
> > to also drop std::tuple the number of parameters decides which
> > function we call. And we don't instantiate std::tuple. And we can also
> > get rid of the __try_to_lock function, which was only used to deduce
> > the lock type rather than use tuple_element to get it. That's much
> > nicer.
>
> 👍
>
> > > How about optimizing a likely common case where all lockables have the
> > > same
> > > type? In that case we don't require recursion and can manage stack
> usage
> > > much
> > > simpler:
> > The stack usage is bounded by the number of mutexes being locked,
> > which is unlikely to get large, but we can do that.
>
> Right. I meant simpler, because it takes a while of staring at the
> recursive
> implementation to understand how it works. :)
>
> > We can do it for try_lock too:
> >
> >   template<typename _L1, typename _L2, typename... _L3>
> >     int
> >     try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >     {
> > #if __cplusplus >= 201703L
> >       if constexpr (is_same_v<_L1, _L2>
> >                     && (is_same_v<_L1, _L3> && ...))
> >         {
> >           constexpr int _Np = 2 + sizeof...(_L3);
> >           unique_lock<_L1> __locks[_Np] = {
> >               {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> > try_to_lock}... };
>
> This does a try_lock on all lockabes even if any of them fails. I think
> that's
> not only more expensive but also non-conforming. I think you need to defer
> locking and then loop from beginning to end to break the loop on the first
> unsuccessful try_lock.
>

Oops, good point. I'll add a test for that too. Here's the fixed code:

    template<typename _L0, typename... _Lockables>
      inline int
      __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
      {
#if __cplusplus >= 201703L
        if constexpr ((is_same_v<_L0, _Lockables> && ...))
          {
            constexpr int _Np = 1 + sizeof...(_Lockables);
            unique_lock<_L0> __locks[_Np] = {
                {__l0, defer_lock}, {__lockables, defer_lock}...
            };
            for (int __i = 0; __i < _Np; ++__i)
              if (!__locks[__i].try_lock())
                {
                  const int __failed = __i;
                  while (__i--)
                    __locks[__i].unlock();
                  return __i;
                }
            for (auto& __l : __locks)
              __l.release();
            return -1;
          }
        else
#endif




> >           for (int __i = 0; __i < _Np; ++__i)
> >             if (!__locks[__i])
> >               return __i;
> >           for (auto& __l : __locks)
> >             __l.release();
> >           return -1;
> >         }
> >       else
> > #endif
> >       return __detail::__try_lock_impl(__l1, __l2, __l3...);
> >     }
> >
> > > if constexpr ((is_same_v<_L0, _L1> && ...))
> > > {
> > > constexpr int _Np = 1 + sizeof...(_L1);
> > > std::array<unique_lock<_L0>, _Np> __locks = {
> > > {__l0, defer_lock}, {__l1, defer_lock}...
> > > };
> > > int __first = 0;
> > > do {
> > > __locks[__first].lock();
> > > for (int __j = 1; __j < _Np; ++__j)
> > > {
> > > const int __idx = (__first + __j) % _Np;
> > > if (!__locks[__idx].try_lock())
> > > {
> > > for (int __k = __idx; __k != __first;
> > > __k = __k == 1 ? _Np : __k - 1)
> > > __locks[__k - 1].unlock();
> >
> > This loop doesn't work if any try_lock fails when first==0, because
> > the loop termination condition is never reached.
>
> Uh yes. Which is the same reason why the __j loop doesn't start from
> __first +
> 1.
>
> > I find this a bit easier to understand than the loop above, and
> > correct (I think):
> >
> >   for (int __k = __j; __k != 0; --__k)
> >     __locks[(__first + __k - 1) % _Np].unlock();
>
> Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> Like
> unsigned int but with parameter, like:
>
>   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
>     __locks[__k - 1].unlock();
>
> This is the loop I wanted to write, except --__k is simpler to write and
> __k -
> 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> place it's not important enough to abstract.
>

We might be able to use __wrapping_uint in std::seed_seq::generate too, and
maybe some other places in <random>. But we can add that later if we decide
it's worth it.


> > [...]
> > @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     *  @post All arguments are locked.
> >     *
> >     *  All arguments are locked via a sequence of calls to lock(),
> > try_lock() -   *  and unlock().  If the call exits via an exception any
> > locks that were -   *  obtained will be released.
> > +   *  and unlock().  If this function exits via an exception any locks
> that
> > +   *  were obtained will be released.
> >     */
> >    template<typename _L1, typename _L2, typename... _L3>
> >      void
> >      lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >      {
> > -      int __i = 0;
> > -      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> > +#if __cplusplus >= 201703L
>
> I also considered moving it down here. Makes sense unless you want to call
> __detail::__lock_impl from other functions. And if we want to make it work
> for
> pre-C++11 we could do
>
>   using __homogeneous
>     = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
>   int __i = 0;
>   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
>
>
We don't need tag dispatching, we could just do:

if _GLIBCXX17_CONSTEXPR (homogeneous::value)
 ...
else
 ...

because both branches are valid for the homogeneous case, i.e. we aren't
using if-constexpr to avoid invalid instantiations. So for pre-C++17 it
would still compile both branches (and instantiate the recursive function
templates) but would use the iterative code at run time.

But given that the default -std option is gnu++17 now, I'm OK with the
iterative version only being used for C++17.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] libstdc++: Improve std::lock algorithm
  2021-06-22 15:20       ` Jonathan Wakely
@ 2021-06-22 16:03         ` Matthias Kretz
  2021-06-22 19:51           ` Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Matthias Kretz @ 2021-06-22 16:03 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc Patches

On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > This does a try_lock on all lockabes even if any of them fails. I think
> > that's
> > not only more expensive but also non-conforming. I think you need to defer
> > locking and then loop from beginning to end to break the loop on the first
> > unsuccessful try_lock.
> 
> Oops, good point. I'll add a test for that too. Here's the fixed code:
> 
>     template<typename _L0, typename... _Lockables>
>       inline int
>       __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
>       {
> #if __cplusplus >= 201703L
>         if constexpr ((is_same_v<_L0, _Lockables> && ...))
>           {
>             constexpr int _Np = 1 + sizeof...(_Lockables);
>             unique_lock<_L0> __locks[_Np] = {
>                 {__l0, defer_lock}, {__lockables, defer_lock}...
>             };
>             for (int __i = 0; __i < _Np; ++__i)

I thought coding style requires a { here?

>               if (!__locks[__i].try_lock())
>                 {
>                   const int __failed = __i;
>                   while (__i--)
>                     __locks[__i].unlock();
>                   return __i;

You meant `return __failed`?

>                 }
>             for (auto& __l : __locks)
>               __l.release();
>             return -1;
>           }
>         else
> #endif
> 
> > [...]
> > Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> > Like
> > 
> > unsigned int but with parameter, like:
> >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> >   
> >     __locks[__k - 1].unlock();
> > 
> > This is the loop I wanted to write, except --__k is simpler to write and
> > __k -
> > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> > place it's not important enough to abstract.
> 
> We might be able to use __wrapping_uint in std::seed_seq::generate too, and
> maybe some other places in <random>. But we can add that later if we decide
> it's worth it.

OK.

> > I also considered moving it down here. Makes sense unless you want to call
> > __detail::__lock_impl from other functions. And if we want to make it work
> > for
> > pre-C++11 we could do
> > 
> >   using __homogeneous
> >   
> >     = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
> >   
> >   int __i = 0;
> >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> 
> We don't need tag dispatching, we could just do:
> 
> if _GLIBCXX17_CONSTEXPR (homogeneous::value)
>  ...
> else
>  ...
> 
> because both branches are valid for the homogeneous case, i.e. we aren't
> using if-constexpr to avoid invalid instantiations.

But for the inhomogeneous case the homogeneous code is invalid (initialization 
of C-array of unique_lock<_L1>).

> But given that the default -std option is gnu++17 now, I'm OK with the
> iterative version only being used for C++17.

Fair enough.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] libstdc++: Improve std::lock algorithm
  2021-06-22 16:03         ` Matthias Kretz
@ 2021-06-22 19:51           ` Jonathan Wakely
  2021-06-22 20:32             ` [PATCH v3] " Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-22 19:51 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: libstdc++, gcc Patches

On Tue, 22 Jun 2021 at 17:03, Matthias Kretz <m.kretz@gsi.de> wrote:
>
> On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> > On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > > This does a try_lock on all lockabes even if any of them fails. I think
> > > that's
> > > not only more expensive but also non-conforming. I think you need to defer
> > > locking and then loop from beginning to end to break the loop on the first
> > > unsuccessful try_lock.
> >
> > Oops, good point. I'll add a test for that too. Here's the fixed code:
> >
> >     template<typename _L0, typename... _Lockables>
> >       inline int
> >       __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
> >       {
> > #if __cplusplus >= 201703L
> >         if constexpr ((is_same_v<_L0, _Lockables> && ...))
> >           {
> >             constexpr int _Np = 1 + sizeof...(_Lockables);
> >             unique_lock<_L0> __locks[_Np] = {
> >                 {__l0, defer_lock}, {__lockables, defer_lock}...
> >             };
> >             for (int __i = 0; __i < _Np; ++__i)
>
> I thought coding style requires a { here?

Maybe for the compiler, but I don't think libstdc++ has such a rule. I
can add the braces though, it's probably better.

>
> >               if (!__locks[__i].try_lock())
> >                 {
> >                   const int __failed = __i;
> >                   while (__i--)
> >                     __locks[__i].unlock();
> >                   return __i;
>
> You meant `return __failed`?

Yep, copy&paste error while trying to avoid the TABs in the real code
screwing up the gmail formatting :-(


> >                 }
> >             for (auto& __l : __locks)
> >               __l.release();
> >             return -1;
> >           }
> >         else
> > #endif
> >
> > > [...]
> > > Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> > > Like
> > >
> > > unsigned int but with parameter, like:
> > >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> > >
> > >     __locks[__k - 1].unlock();
> > >
> > > This is the loop I wanted to write, except --__k is simpler to write and
> > > __k -
> > > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> > > place it's not important enough to abstract.
> >
> > We might be able to use __wrapping_uint in std::seed_seq::generate too, and
> > maybe some other places in <random>. But we can add that later if we decide
> > it's worth it.
>
> OK.
>
> > > I also considered moving it down here. Makes sense unless you want to call
> > > __detail::__lock_impl from other functions. And if we want to make it work
> > > for
> > > pre-C++11 we could do
> > >
> > >   using __homogeneous
> > >
> > >     = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
> > >
> > >   int __i = 0;
> > >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> >
> > We don't need tag dispatching, we could just do:
> >
> > if _GLIBCXX17_CONSTEXPR (homogeneous::value)
> >  ...
> > else
> >  ...
> >
> > because both branches are valid for the homogeneous case, i.e. we aren't
> > using if-constexpr to avoid invalid instantiations.
>
> But for the inhomogeneous case the homogeneous code is invalid (initialization
> of C-array of unique_lock<_L1>).

Oops, yeah of course.

>
> > But given that the default -std option is gnu++17 now, I'm OK with the
> > iterative version only being used for C++17.
>
> Fair enough.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] libstdc++: Improve std::lock algorithm
  2021-06-22 19:51           ` Jonathan Wakely
@ 2021-06-22 20:32             ` Jonathan Wakely
  2021-06-23  7:21               ` Christophe Lyon
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-22 20:32 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: libstdc++, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 3789 bytes --]

On Tue, 22 Jun 2021 at 20:51, Jonathan Wakely wrote:
>
> On Tue, 22 Jun 2021 at 17:03, Matthias Kretz <m.kretz@gsi.de> wrote:
> >
> > On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> > > On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > > > This does a try_lock on all lockabes even if any of them fails. I think
> > > > that's
> > > > not only more expensive but also non-conforming. I think you need to defer
> > > > locking and then loop from beginning to end to break the loop on the first
> > > > unsuccessful try_lock.
> > >
> > > Oops, good point. I'll add a test for that too. Here's the fixed code:
> > >
> > >     template<typename _L0, typename... _Lockables>
> > >       inline int
> > >       __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
> > >       {
> > > #if __cplusplus >= 201703L
> > >         if constexpr ((is_same_v<_L0, _Lockables> && ...))
> > >           {
> > >             constexpr int _Np = 1 + sizeof...(_Lockables);
> > >             unique_lock<_L0> __locks[_Np] = {
> > >                 {__l0, defer_lock}, {__lockables, defer_lock}...
> > >             };
> > >             for (int __i = 0; __i < _Np; ++__i)
> >
> > I thought coding style requires a { here?
>
> Maybe for the compiler, but I don't think libstdc++ has such a rule. I
> can add the braces though, it's probably better.
>
> >
> > >               if (!__locks[__i].try_lock())
> > >                 {
> > >                   const int __failed = __i;
> > >                   while (__i--)
> > >                     __locks[__i].unlock();
> > >                   return __i;
> >
> > You meant `return __failed`?
>
> Yep, copy&paste error while trying to avoid the TABs in the real code
> screwing up the gmail formatting :-(
>
>
> > >                 }
> > >             for (auto& __l : __locks)
> > >               __l.release();
> > >             return -1;
> > >           }
> > >         else
> > > #endif
> > >
> > > > [...]
> > > > Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> > > > Like
> > > >
> > > > unsigned int but with parameter, like:
> > > >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> > > >
> > > >     __locks[__k - 1].unlock();
> > > >
> > > > This is the loop I wanted to write, except --__k is simpler to write and
> > > > __k -
> > > > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> > > > place it's not important enough to abstract.
> > >
> > > We might be able to use __wrapping_uint in std::seed_seq::generate too, and
> > > maybe some other places in <random>. But we can add that later if we decide
> > > it's worth it.
> >
> > OK.
> >
> > > > I also considered moving it down here. Makes sense unless you want to call
> > > > __detail::__lock_impl from other functions. And if we want to make it work
> > > > for
> > > > pre-C++11 we could do
> > > >
> > > >   using __homogeneous
> > > >
> > > >     = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
> > > >
> > > >   int __i = 0;
> > > >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> > >
> > > We don't need tag dispatching, we could just do:
> > >
> > > if _GLIBCXX17_CONSTEXPR (homogeneous::value)
> > >  ...
> > > else
> > >  ...
> > >
> > > because both branches are valid for the homogeneous case, i.e. we aren't
> > > using if-constexpr to avoid invalid instantiations.
> >
> > But for the inhomogeneous case the homogeneous code is invalid (initialization
> > of C-array of unique_lock<_L1>).
>
> Oops, yeah of course.
>
> >
> > > But given that the default -std option is gnu++17 now, I'm OK with the
> > > iterative version only being used for C++17.
> >
> > Fair enough.

Here's what I've tested and pushed to trunk. Thanks for the
improvement and comments.

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 12128 bytes --]

commit c556596119307f9ef1c9079ef2bd3a035dea355d
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Jun 22 13:35:19 2021

    libstdc++: Simplify std::try_lock and std::lock further
    
    The std::try_lock and std::lock algorithms can use iteration instead of
    recursion when all lockables have the same type and can be held by an
    array of unique_lock<L> objects.
    
    By making this change to __detail::__try_lock_impl it also benefits
    __detail::__lock_impl, which uses it. For std::lock we can just put the
    iterative version directly in std::lock, to avoid making any call to
    __detail::__lock_impl.
    
    Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
    
    Co-authored-by: Matthias Kretz <m.kretz@gsi.de>
    
    libstdc++-v3/ChangeLog:
    
            * include/std/mutex (lock): Replace recursion with iteration
            when lockables all have the same type.
            (__detail::__try_lock_impl): Likewise. Pass lockables as
            parameters, instead of a tuple. Always lock the first one, and
            recurse for the rest.
            (__detail::__lock_impl): Adjust call to __try_lock_impl.
            (__detail::__try_to_lock): Remove.
            * testsuite/30_threads/lock/3.cc: Check that mutexes are locked.
            * testsuite/30_threads/lock/4.cc: Also test non-heterogeneous
            arguments.
            * testsuite/30_threads/unique_lock/cons/60497.cc: Also check
            std::try_lock.
            * testsuite/30_threads/try_lock/5.cc: New test.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 5f2d8f9ee7b..c18ca1a1955 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -514,39 +514,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// @cond undocumented
   namespace __detail
   {
+    // Lock the last lockable, after all previous ones are locked.
     template<typename _Lockable>
-      inline unique_lock<_Lockable>
-      __try_to_lock(_Lockable& __l)
-      { return unique_lock<_Lockable>{__l, try_to_lock}; }
-
-    // Lock the last element of the tuple, after all previous ones are locked.
-    template<int _Idx, typename... _Lockables>
-      inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
-      __try_lock_impl(tuple<_Lockables&...>& __lockables)
+      inline int
+      __try_lock_impl(_Lockable& __lockable)
       {
-	if (auto __lock = __detail::__try_to_lock(std::get<_Idx>(__lockables)))
+	if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
 	  {
 	    __lock.release();
 	    return -1;
 	  }
 	else
-	  return _Idx;
+	  return 0;
       }
 
-    // Lock tuple elements starting from _Idx.
-    template<int _Idx, typename... _Lockables>
-      inline __enable_if_t<_Idx + 1 != sizeof...(_Lockables), int>
-      __try_lock_impl(tuple<_Lockables&...>& __lockables)
+    // Lock each lockable in turn.
+    // Use iteration if all lockables are the same type, recursion otherwise.
+    template<typename _L0, typename... _Lockables>
+      inline int
+      __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
       {
-	if (auto __lock = __detail::__try_to_lock(std::get<_Idx>(__lockables)))
+#if __cplusplus >= 201703L
+	if constexpr ((is_same_v<_L0, _Lockables> && ...))
 	  {
-	    int __idx = __detail::__try_lock_impl<_Idx + 1>(__lockables);
-	    if (__idx == -1)
-	      __lock.release();
-	    return __idx;
+	    constexpr int _Np = 1 + sizeof...(_Lockables);
+	    unique_lock<_L0> __locks[_Np] = {
+		{__l0, defer_lock}, {__lockables, defer_lock}...
+	    };
+	    for (int __i = 0; __i < _Np; ++__i)
+	      {
+		if (!__locks[__i].try_lock())
+		  {
+		    const int __failed = __i;
+		    while (__i--)
+		      __locks[__i].unlock();
+		    return __failed;
+		  }
+	      }
+	    for (auto& __l : __locks)
+	      __l.release();
+	    return -1;
 	  }
 	else
-	  return _Idx;
+#endif
+	if (unique_lock<_L0> __lock{__l0, try_to_lock})
+	  {
+	    int __idx = __detail::__try_lock_impl(__lockables...);
+	    if (__idx == -1)
+	      {
+		__lock.release();
+		return -1;
+	      }
+	    return __idx + 1;
+	  }
+	else
+	  return 0;
       }
 
   } // namespace __detail
@@ -562,12 +584,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    *
    *  Sequentially calls try_lock() on each argument.
    */
-  template<typename _Lock1, typename _Lock2, typename... _Lock3>
+  template<typename _L1, typename _L2, typename... _L3>
     int
-    try_lock(_Lock1& __l1, _Lock2& __l2, _Lock3&... __l3)
+    try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
     {
-      auto __lockables = std::tie(__l1, __l2, __l3...);
-      return __detail::__try_lock_impl<0>(__lockables);
+      return __detail::__try_lock_impl(__l1, __l2, __l3...);
     }
 
   /// @cond undocumented
@@ -589,8 +610,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		int __failed = 1; // index that couldn't be locked
 		{
 		  unique_lock<_L0> __first(__l0);
-		  auto __rest = std::tie(__l1...);
-		  __failed += __detail::__try_lock_impl<0>(__rest);
+		  __failed += __detail::__try_lock_impl(__l1...);
 		  if (!__failed)
 		    {
 		      __i = -1; // finished
@@ -620,15 +640,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    *  @post All arguments are locked.
    *
    *  All arguments are locked via a sequence of calls to lock(), try_lock()
-   *  and unlock().  If the call exits via an exception any locks that were
-   *  obtained will be released.
+   *  and unlock().  If this function exits via an exception any locks that
+   *  were obtained will be released.
    */
   template<typename _L1, typename _L2, typename... _L3>
     void
     lock(_L1& __l1, _L2& __l2, _L3&... __l3)
     {
-      int __i = 0;
-      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
+#if __cplusplus >= 201703L
+      if constexpr (is_same_v<_L1, _L2> && (is_same_v<_L1, _L3> && ...))
+	{
+	  constexpr int _Np = 2 + sizeof...(_L3);
+	  unique_lock<_L1> __locks[] = {
+	      {__l1, defer_lock}, {__l2, defer_lock}, {__l3, defer_lock}...
+	  };
+	  int __first = 0;
+	  do {
+	    __locks[__first].lock();
+	    for (int __j = 1; __j < _Np; ++__j)
+	      {
+		const int __idx = (__first + __j) % _Np;
+		if (!__locks[__idx].try_lock())
+		  {
+		    for (int __k = __j; __k != 0; --__k)
+		      __locks[(__first + __k - 1) % _Np].unlock();
+		    __first = __idx;
+		    break;
+		  }
+	      }
+	  } while (!__locks[__first].owns_lock());
+
+	  for (auto& __l : __locks)
+	    __l.release();
+	}
+      else
+#endif
+	{
+	  int __i = 0;
+	  __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
+	}
     }
 
 #if __cplusplus >= 201703L
diff --git a/libstdc++-v3/testsuite/30_threads/lock/3.cc b/libstdc++-v3/testsuite/30_threads/lock/3.cc
index 52136077be8..5fa118768bd 100644
--- a/libstdc++-v3/testsuite/30_threads/lock/3.cc
+++ b/libstdc++-v3/testsuite/30_threads/lock/3.cc
@@ -37,7 +37,7 @@ struct user_lock
     is_locked = true;
   }
 
-  bool try_lock() 
+  bool try_lock()
   { return is_locked ? false : (is_locked = true); }
 
   void unlock()
@@ -62,6 +62,8 @@ int main()
 	{
 	  //heterogeneous types
 	  std::lock(m1, m2, m3);
+	  VERIFY( !m1.try_lock() );
+	  VERIFY( !m3.try_lock() );
 	  m1.unlock();
 	  m2.unlock();
 	  m3.unlock();
diff --git a/libstdc++-v3/testsuite/30_threads/lock/4.cc b/libstdc++-v3/testsuite/30_threads/lock/4.cc
index 7ba15cba84b..130c1f62d73 100644
--- a/libstdc++-v3/testsuite/30_threads/lock/4.cc
+++ b/libstdc++-v3/testsuite/30_threads/lock/4.cc
@@ -73,6 +73,7 @@ int unreliable_lock::lock_on = -1;
 void test01()
 {
   unreliable_lock l1, l2, l3;
+  std::mutex m1, m2, m3;
 
   try
     {
@@ -87,6 +88,60 @@ void test01()
     {
       VERIFY( false );
     }
+
+  // Repeat with non-heterogeneous arguments
+
+  try
+    {
+      unreliable_lock::count = 0;
+      std::lock(l1, l2, l3, m1);
+      VERIFY( unreliable_lock::count == 3 );
+      l1.unlock();
+      l2.unlock();
+      l3.unlock();
+      VERIFY( !m1.try_lock() ); // already locked
+      m1.unlock();
+    }
+  catch (...)
+    {
+      VERIFY( false );
+    }
+
+  try
+    {
+      unreliable_lock::count = 0;
+      std::lock(m1, l1, l2, l3);
+      VERIFY( unreliable_lock::count == 3 );
+      VERIFY( !m1.try_lock() ); // already locked
+      m1.unlock();
+      l1.unlock();
+      l2.unlock();
+      l3.unlock();
+    }
+  catch (...)
+    {
+      VERIFY( false );
+    }
+
+  try
+    {
+      unreliable_lock::count = 0;
+      std::lock(l1, m1, l2, m2, l3, m3);
+      VERIFY( unreliable_lock::count == 3 );
+      l1.unlock();
+      l2.unlock();
+      l3.unlock();
+      VERIFY( !m1.try_lock() ); // already locked
+      VERIFY( !m2.try_lock() ); // already locked
+      VERIFY( !m3.try_lock() ); // already locked
+      m1.unlock();
+      m2.unlock();
+      m3.unlock();
+    }
+  catch (...)
+    {
+      VERIFY( false );
+    }
 }
 
 void test02()
@@ -111,6 +166,31 @@ void test02()
     {
       VERIFY( false );
     }
+
+  // Repeat with non-heterogeneous arguments
+
+  try
+    {
+      unreliable_lock::lock_on = 1;
+      while (unreliable_lock::lock_on < 3)
+      {
+        unreliable_lock::count = 0;
+        unreliable_lock l1, l2, l3;
+	std::mutex m1;
+        std::lock(l1, l2, l3, m1);
+        VERIFY( unreliable_lock::count > 3 );
+        l1.unlock();
+        l2.unlock();
+        l3.unlock();
+	VERIFY( !m1.try_lock() ); // already locked
+        m1.unlock();
+        ++unreliable_lock::lock_on;
+      }
+    }
+  catch (...)
+    {
+      VERIFY( false );
+    }
 }
 
 void test03()
@@ -133,6 +213,50 @@ void test03()
     VERIFY( test );
     ++unreliable_lock::throw_on;
   }
+
+  // Repeat with non-heterogeneous arguments
+
+  unreliable_lock::throw_on = 0;
+  while (unreliable_lock::throw_on < 3)
+  {
+    unreliable_lock::count = 0;
+    unreliable_lock l1, l2, l3;
+    std::mutex m1;
+    bool test = false;
+    try
+      {
+        std::lock(l1, l2, l3, m1);
+      }
+    catch (...)
+      {
+        test = true;
+      }
+    VERIFY( test );
+    VERIFY( m1.try_lock() ); // m1 was not left locked by failed std::lock
+    m1.unlock();
+    ++unreliable_lock::throw_on;
+  }
+
+  unreliable_lock::throw_on = 0;
+  while (unreliable_lock::throw_on < 3)
+  {
+    unreliable_lock::count = 0;
+    unreliable_lock l1, l2, l3;
+    std::mutex m1;
+    bool test = false;
+    try
+      {
+        std::lock(m1, l1, l2, l3);
+      }
+    catch (...)
+      {
+        test = true;
+      }
+    VERIFY( test );
+    VERIFY( m1.try_lock() ); // m1 was not left locked by failed std::lock
+    m1.unlock();
+    ++unreliable_lock::throw_on;
+  }
 }
 
 int main()
diff --git a/libstdc++-v3/testsuite/30_threads/try_lock/5.cc b/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
new file mode 100644
index 00000000000..a5574ff01fb
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
@@ -0,0 +1,41 @@
+// { dg-do run { target c++11 } }
+
+#include <mutex>
+#include <testsuite_hooks.h>
+
+struct Lockable
+{
+  static int tries;
+
+  void lock() { }
+  void unlock() { }
+  bool try_lock() { return ++tries != 2; }
+};
+
+int Lockable::tries = 0;
+
+void test01()
+{
+  Lockable l1, l2, l3;
+  std::mutex m1, m2;
+
+  VERIFY( std::try_lock(l1, l2, l3) == 1 );
+  VERIFY( Lockable::tries == 2 );
+
+  Lockable::tries = 0;
+  VERIFY( std::try_lock(m1, l1, l2, l3) == 2 );
+  VERIFY( Lockable::tries == 2 );
+
+  Lockable::tries = 0;
+  VERIFY( std::try_lock(l1, l2, l3, m1) == 1 );
+  VERIFY( Lockable::tries == 2 );
+
+  Lockable::tries = 0;
+  VERIFY( std::try_lock(m1, l1, l2, l3, m2) == 2 );
+  VERIFY( Lockable::tries == 2 );
+}
+
+int main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/30_threads/unique_lock/cons/60497.cc b/libstdc++-v3/testsuite/30_threads/unique_lock/cons/60497.cc
index 8603e56790e..08698cea783 100644
--- a/libstdc++-v3/testsuite/30_threads/unique_lock/cons/60497.cc
+++ b/libstdc++-v3/testsuite/30_threads/unique_lock/cons/60497.cc
@@ -46,3 +46,9 @@ void test02()
   test_type l1, l2, l3;
   std::lock(l1, l2, l3);
 }
+
+void test03()
+{
+  test_type l1, l2, l3;
+  std::try_lock(l1, l2, l3);
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] libstdc++: Improve std::lock algorithm
  2021-06-22 20:32             ` [PATCH v3] " Jonathan Wakely
@ 2021-06-23  7:21               ` Christophe Lyon
  2021-06-23  9:17                 ` Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe Lyon @ 2021-06-23  7:21 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Matthias Kretz, libstdc++, gcc Patches

Hi Jonathan,

On Tue, 22 Jun 2021 at 22:34, Jonathan Wakely via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, 22 Jun 2021 at 20:51, Jonathan Wakely wrote:
> >
> > On Tue, 22 Jun 2021 at 17:03, Matthias Kretz <m.kretz@gsi.de> wrote:
> > >
> > > On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> > > > On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > > > > This does a try_lock on all lockabes even if any of them fails. I think
> > > > > that's
> > > > > not only more expensive but also non-conforming. I think you need to defer
> > > > > locking and then loop from beginning to end to break the loop on the first
> > > > > unsuccessful try_lock.
> > > >
> > > > Oops, good point. I'll add a test for that too. Here's the fixed code:
> > > >
> > > >     template<typename _L0, typename... _Lockables>
> > > >       inline int
> > > >       __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
> > > >       {
> > > > #if __cplusplus >= 201703L
> > > >         if constexpr ((is_same_v<_L0, _Lockables> && ...))
> > > >           {
> > > >             constexpr int _Np = 1 + sizeof...(_Lockables);
> > > >             unique_lock<_L0> __locks[_Np] = {
> > > >                 {__l0, defer_lock}, {__lockables, defer_lock}...
> > > >             };
> > > >             for (int __i = 0; __i < _Np; ++__i)
> > >
> > > I thought coding style requires a { here?
> >
> > Maybe for the compiler, but I don't think libstdc++ has such a rule. I
> > can add the braces though, it's probably better.
> >
> > >
> > > >               if (!__locks[__i].try_lock())
> > > >                 {
> > > >                   const int __failed = __i;
> > > >                   while (__i--)
> > > >                     __locks[__i].unlock();
> > > >                   return __i;
> > >
> > > You meant `return __failed`?
> >
> > Yep, copy&paste error while trying to avoid the TABs in the real code
> > screwing up the gmail formatting :-(
> >
> >
> > > >                 }
> > > >             for (auto& __l : __locks)
> > > >               __l.release();
> > > >             return -1;
> > > >           }
> > > >         else
> > > > #endif
> > > >
> > > > > [...]
> > > > > Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> > > > > Like
> > > > >
> > > > > unsigned int but with parameter, like:
> > > > >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> > > > >
> > > > >     __locks[__k - 1].unlock();
> > > > >
> > > > > This is the loop I wanted to write, except --__k is simpler to write and
> > > > > __k -
> > > > > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> > > > > place it's not important enough to abstract.
> > > >
> > > > We might be able to use __wrapping_uint in std::seed_seq::generate too, and
> > > > maybe some other places in <random>. But we can add that later if we decide
> > > > it's worth it.
> > >
> > > OK.
> > >
> > > > > I also considered moving it down here. Makes sense unless you want to call
> > > > > __detail::__lock_impl from other functions. And if we want to make it work
> > > > > for
> > > > > pre-C++11 we could do
> > > > >
> > > > >   using __homogeneous
> > > > >
> > > > >     = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
> > > > >
> > > > >   int __i = 0;
> > > > >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> > > >
> > > > We don't need tag dispatching, we could just do:
> > > >
> > > > if _GLIBCXX17_CONSTEXPR (homogeneous::value)
> > > >  ...
> > > > else
> > > >  ...
> > > >
> > > > because both branches are valid for the homogeneous case, i.e. we aren't
> > > > using if-constexpr to avoid invalid instantiations.
> > >
> > > But for the inhomogeneous case the homogeneous code is invalid (initialization
> > > of C-array of unique_lock<_L1>).
> >
> > Oops, yeah of course.
> >
> > >
> > > > But given that the default -std option is gnu++17 now, I'm OK with the
> > > > iterative version only being used for C++17.
> > >
> > > Fair enough.
>
> Here's what I've tested and pushed to trunk. Thanks for the
> improvement and comments.

This patch causes GCC build failures for bare-metal targets
(aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:

In file included from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
                 from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
In function 'int std::__detail::__try_lock_impl(_Lockable&)':
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
error: expected primary-expression before ',' token
  522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
      |                                                     ^
In file included from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
                 from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
In function 'int std::__detail::__try_lock_impl(_Lockable&)':
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
error: expected primary-expression before ',' token
  522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
      |                                                     ^
make[4]: *** [Makefile:1862:
arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1

Can you have a look?

Thanks

Christophe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] libstdc++: Improve std::lock algorithm
  2021-06-23  7:21               ` Christophe Lyon
@ 2021-06-23  9:17                 ` Jonathan Wakely
  2021-06-23  9:43                   ` Christophe LYON
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-23  9:17 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Matthias Kretz, libstdc++, gcc Patches

On Wed, 23 Jun 2021 at 08:21, Christophe Lyon wrote:
> This patch causes GCC build failures for bare-metal targets
> (aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:
>
> In file included from
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
>                  from
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
> error: expected primary-expression before ',' token
>   522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
>       |                                                     ^
> In file included from
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
>                  from
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
> error: expected primary-expression before ',' token
>   522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
>       |                                                     ^
> make[4]: *** [Makefile:1862:
> arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1

Sigh, __lockable is a macro in newlib:

newlib/libc/include/sys/cdefs.h:#define __lockable
__lock_annotate(lockable)

I'll change the name.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] libstdc++: Improve std::lock algorithm
  2021-06-23  9:17                 ` Jonathan Wakely
@ 2021-06-23  9:43                   ` Christophe LYON
  2021-06-23 10:11                     ` Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Christophe LYON @ 2021-06-23  9:43 UTC (permalink / raw)
  To: libstdc++


On 23/06/2021 11:17, Jonathan Wakely via Libstdc++ wrote:
> On Wed, 23 Jun 2021 at 08:21, Christophe Lyon wrote:
>> This patch causes GCC build failures for bare-metal targets
>> (aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:
>>
>> In file included from
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
>>                   from
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
>> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
>> error: expected primary-expression before ',' token
>>    522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
>>        |                                                     ^
>> In file included from
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
>>                   from
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
>> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
>> error: expected primary-expression before ',' token
>>    522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
>>        |                                                     ^
>> make[4]: *** [Makefile:1862:
>> arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
> Sigh, __lockable is a macro in newlib:
>
> newlib/libc/include/sys/cdefs.h:#define __lockable
> __lock_annotate(lockable)
>
> I'll change the name.

Thanks (sorry for being back ;-)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] libstdc++: Improve std::lock algorithm
  2021-06-23  9:43                   ` Christophe LYON
@ 2021-06-23 10:11                     ` Jonathan Wakely
  2021-06-23 11:16                       ` Christophe LYON
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2021-06-23 10:11 UTC (permalink / raw)
  To: Christophe LYON; +Cc: libstdc++, gcc Patches

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

On Wed, 23 Jun 2021 at 10:43, Christophe LYON wrote:
>
>
> On 23/06/2021 11:17, Jonathan Wakely via Libstdc++ wrote:
> > On Wed, 23 Jun 2021 at 08:21, Christophe Lyon wrote:
> >> This patch causes GCC build failures for bare-metal targets
> >> (aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:
> >>
> >> In file included from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
> >>                   from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
> >> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
> >> error: expected primary-expression before ',' token
> >>    522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
> >>        |                                                     ^
> >> In file included from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
> >>                   from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
> >> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
> >> error: expected primary-expression before ',' token
> >>    522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
> >>        |                                                     ^
> >> make[4]: *** [Makefile:1862:
> >> arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
> > Sigh, __lockable is a macro in newlib:
> >
> > newlib/libc/include/sys/cdefs.h:#define __lockable
> > __lock_annotate(lockable)
> >
> > I'll change the name.
>
> Thanks (sorry for being back ;-)

Fixed with this patch, tested x86_64 and pushed to trunk.

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 2785 bytes --]

commit 75404109dce57d2f8dac0f90808010233928418f
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Wed Jun 23 11:05:51 2021

    libstdc++: Avoid "__lockable" name defined as macro by newlib
    
    libstdc++-v3/ChangeLog:
    
            * include/std/mutex (__detail::__try_lock_impl): Rename
            parameter to avoid clashing with newlib's __lockable macro.
            (try_lock): Add 'inline' specifier.
            * testsuite/17_intro/names.cc: Add check for __lockable.
            * testsuite/30_threads/try_lock/5.cc: Add options for pthreads.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index c18ca1a1955..eeb51fdb840 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -517,9 +517,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     // Lock the last lockable, after all previous ones are locked.
     template<typename _Lockable>
       inline int
-      __try_lock_impl(_Lockable& __lockable)
+      __try_lock_impl(_Lockable& __l)
       {
-	if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
+	if (unique_lock<_Lockable> __lock{__l, try_to_lock})
 	  {
 	    __lock.release();
 	    return -1;
@@ -585,7 +585,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    *  Sequentially calls try_lock() on each argument.
    */
   template<typename _L1, typename _L2, typename... _L3>
-    int
+    inline int
     try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
     {
       return __detail::__try_lock_impl(__l1, __l2, __l3...);
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc
index 624e3ed9ccf..534dab70ff5 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -16,6 +16,7 @@
 // <http://www.gnu.org/licenses/>.
 
 // { dg-do compile }
+// { dg-add-options no_pch }
 
 // Define macros for some common variables names that we must not use for
 // naming variables, parameters etc. in the library.
@@ -216,6 +217,11 @@
 #undef y
 #endif
 
+#if ! __has_include(<newlib.h>)
+// newlib's <sys/cdefs.h> defines __lockable as a macro, so we can't use it.
+# define __lockable		cannot be used as an identifier
+#endif
+
 #ifdef __sun__
 // See https://gcc.gnu.org/ml/libstdc++/2019-05/msg00175.html
 #undef ptr
diff --git a/libstdc++-v3/testsuite/30_threads/try_lock/5.cc b/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
index a5574ff01fb..b9ce1cc9b90 100644
--- a/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
+++ b/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
@@ -1,4 +1,7 @@
-// { dg-do run { target c++11 } }
+// { dg-do run }
+// { dg-additional-options "-pthread" { target pthread } }
+// { dg-require-effective-target c++11 }
+// { dg-require-gthreads "" }
 
 #include <mutex>
 #include <testsuite_hooks.h>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] libstdc++: Improve std::lock algorithm
  2021-06-23 10:11                     ` Jonathan Wakely
@ 2021-06-23 11:16                       ` Christophe LYON
  0 siblings, 0 replies; 13+ messages in thread
From: Christophe LYON @ 2021-06-23 11:16 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc Patches


On 23/06/2021 12:11, Jonathan Wakely wrote:
> On Wed, 23 Jun 2021 at 10:43, Christophe LYON wrote:
>>
>> On 23/06/2021 11:17, Jonathan Wakely via Libstdc++ wrote:
>>> On Wed, 23 Jun 2021 at 08:21, Christophe Lyon wrote:
>>>> This patch causes GCC build failures for bare-metal targets
>>>> (aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:
>>>>
>>>> In file included from
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
>>>>                    from
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
>>>> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
>>>> error: expected primary-expression before ',' token
>>>>     522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
>>>>         |                                                     ^
>>>> In file included from
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
>>>>                    from
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
>>>> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
>>>> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
>>>> error: expected primary-expression before ',' token
>>>>     522 |         if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
>>>>         |                                                     ^
>>>> make[4]: *** [Makefile:1862:
>>>> arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
>>> Sigh, __lockable is a macro in newlib:
>>>
>>> newlib/libc/include/sys/cdefs.h:#define __lockable
>>> __lock_annotate(lockable)
>>>
>>> I'll change the name.
>> Thanks (sorry for being back ;-)
> Fixed with this patch, tested x86_64 and pushed to trunk.

Thanks for the prompt fix, I confirm it's OK again on my side.


Christophe



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-06-23 11:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-21 17:31 [committed] libstdc++: Improve std::lock algorithm Jonathan Wakely
2021-06-22  9:07 ` Matthias Kretz
2021-06-22 12:51   ` [PATCH v2] " Jonathan Wakely
2021-06-22 13:21     ` Matthias Kretz
2021-06-22 15:20       ` Jonathan Wakely
2021-06-22 16:03         ` Matthias Kretz
2021-06-22 19:51           ` Jonathan Wakely
2021-06-22 20:32             ` [PATCH v3] " Jonathan Wakely
2021-06-23  7:21               ` Christophe Lyon
2021-06-23  9:17                 ` Jonathan Wakely
2021-06-23  9:43                   ` Christophe LYON
2021-06-23 10:11                     ` Jonathan Wakely
2021-06-23 11:16                       ` Christophe LYON

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).