public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements
@ 2020-05-29  6:17 Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 1/8] libstdc++: Improve async test Mike Crowe
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

This series ensures that the std::future::wait_* functions use
std::chrono::steady_clock when required, introduces
std::chrono::__detail::ceil to make that easier to do, and then makes
use of that function to simplify and improve the fix for PR68519 in
std::condition_variable.

v1 of this series was originally posted back in September 2017 (see
https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html )

v2 of this series was originally posted back in January 2018 (see
https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html )

v3 of this series was originally posted back in August 2018 (see
https://gcc.gnu.org/ml/libstdc++/2018-08/msg00001.html )

v4 of this series was originally posted back in October 2019 (see
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01934.html )

Changes since v4:

* Expose std::chrono::ceil as std::chrono::__detail::ceil so that it
  can be used to fix PR91486 in std::future::wait_for (as suggested by
  John Salmon in PR91486.)

* Use std::chrono::__detail::ceil to simplify fix for PR68519 in
  std::condition_variable::wait_for.

* Also fix equivalent of PR68519 in
  std::condition_variable::wait_until and add test.

Changelog:
            * libstdc++-v3/include/std/condition_variable:
              (condition_variable::wait_until): Convert delta to
              steady_clock duration before adding to current steady_clock
              time to avoid rounding errors described in
              PR68519. (condition_variable::wait_for): Simplify calculation
              of absolute time by using chrono::__detail::ceil in both
              overloads.  *
              libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc:
              (test_wait_for): Renamed from test01. Replace unassigned val
              variable with constant false. Reduce scope of mx and cv
              variables to just test_wait_for function. (test_wait_until):
              Add new test case.

            * libstdc++-v3/include/std/chrono: (__detail::ceil) Move
              implementation of std::chrono::ceil into private namespace so
              that it's available to pre-C++17 code.

            * libstdc++-v3/include/bits/atomic_futex.h:
              (__atomic_futex_unsigned::_M_load_when_equal_for,
              __atomic_futex_unsigned::_M_load_when_equal_until): Use
              __detail::ceil to convert delta to the reference clock
              duration type to avoid resolution problems

            * libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
              New test for __atomic_futex_unsigned::_M_load_when_equal_for.

    * run test03 with steady_clock_copy, which behaves identically to
      std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
      causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
      that takes an arbitrary clock to be called.

    * invent test04 which uses a deliberately slow running clock in order to
      exercise the looping behaviour o
      __atomic_futex_unsigned::_M_load_when_equal_until described above.

            * libstdc++-v3/include/bits/atomic_futex.h:
            (__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
            generic _Clock to check the timeout against _Clock again after
            _M_load_when_equal_until returns indicating a timeout.

            * libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
            slow_clock that runs at an eleventh of steady_clock's speed. Use it
            to test the user-supplied-clock variant of
            __atomic_futex_unsigned::_M_load_when_equal_until works generally
            with test03 and loops correctly when the timeout time hasn't been
            reached in test04.

            * libstdc++-v3/include/bits/atomic_futex.h:
            (__atomic_futex_unsigned): Change __clock_t typedef to use
            steady_clock so that unknown clocks are synced to it rather than
            system_clock. Change existing __clock_t overloads of
            _M_load_and_text_until_impl and _M_load_when_equal_until to use
            system_clock explicitly. Remove comment about DR 887 since these
            changes address that problem as best as we currently able.

            * libstdc++-v3/config/abi/pre/gnu.ver: Update for addition of
              __atomic_futex_unsigned_base::_M_futex_wait_until_steady.

            * libstdc++-v3/include/bits/atomic_futex.h
              (__atomic_futex_unsigned_base): Add comments to clarify that
              _M_futex_wait_until _M_load_and_test_until use CLOCK_REALTIME.
              Declare new _M_futex_wait_until_steady and
              _M_load_and_text_until_steady methods that use CLOCK_MONOTONIC.
              Add _M_load_and_test_until_impl and _M_load_when_equal_until
              overloads that accept a steady_clock time_point and use these new
              methods.

            * libstdc++-v3/src/c++11/futex.cc: Include headers required for
            clock_gettime. Add futex_clock_monotonic_flag constant to tell
            futex to use CLOCK_MONOTONIC to match the existing
            futex_clock_realtime_flag.  Add futex_clock_monotonic_unavailable
            to store the result of trying to use
            CLOCK_MONOTONIC. (__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
            Add new variant of _M_futex_wait_until that uses CLOCK_MONOTONIC to
            support waiting using steady_clock.

            * libstdc++-v3/src/c++11/futex.cc: Add new constants for required
            futex flags.  Add futex_clock_realtime_unavailable flag to store
            result of trying to use
            FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
            Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
            fall back to using gettimeofday and FUTEX_WAIT if that's not
            supported.

             * libstdc++-v3/testsuite/30_threads/async/async.cc (test02): Test
             steady_clock with std::future::wait_until.  (test03): Add new test
             templated on clock type waiting for future associated with async
             to resolve.  (main): Call test03 to test both system_clock and
             steady_clock.

Mike Crowe (8):
  libstdc++: Improve async test
  libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  libstdc++ futex: Support waiting on std::chrono::steady_clock directly
  libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock
  libstdc++ futex: Loop when waiting against arbitrary clock
  libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  libstdc++ condition_variable: Avoid rounding errors on custom clocks
  libstdc++: Extra async tests, not for merging

 libstdc++-v3/config/abi/pre/gnu.ver                                   |  10 ++--
 libstdc++-v3/include/bits/atomic_futex.h                              |  93 +++++++++++++++++++++++++++++++-----
 libstdc++-v3/include/std/chrono                                       |  19 +++++--
 libstdc++-v3/include/std/condition_variable                           |  18 +++----
 libstdc++-v3/src/c++11/futex.cc                                       | 119 ++++++++++++++++++++++++++++++++++++++++++++++-
 libstdc++-v3/testsuite/30_threads/async/async.cc                      | 188 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc |  61 ++++++++++++++++++++---
 7 files changed, 473 insertions(+), 35 deletions(-)

base-commit: 6ce3d791dfcba469e709935aba5743640f7d4959
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 1/8] libstdc++: Improve async test
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait Mike Crowe
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

Add tests for waiting for the future using both std::chrono::steady_clock
and std::chrono::system_clock in preparation for dealing with those clocks
properly in futex.cc.

	 * libstdc++-v3/testsuite/30_threads/async/async.cc (test02): Test
	 steady_clock with std::future::wait_until.  (test03): Add new test
	 templated on clock type waiting for future associated with async
	 to resolve.  (main): Call test03 to test both system_clock and
	 steady_clock.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 33 +++++++++++++++++-
 1 file changed, 33 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 7fa9b03..84d94cf 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -51,17 +51,50 @@ void test02()
   VERIFY( status == std::future_status::timeout );
   status = f1.wait_until(std::chrono::system_clock::now());
   VERIFY( status == std::future_status::timeout );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::timeout );
   l.unlock();  // allow async thread to proceed
   f1.wait();   // wait for it to finish
   status = f1.wait_for(std::chrono::milliseconds(0));
   VERIFY( status == std::future_status::ready );
   status = f1.wait_until(std::chrono::system_clock::now());
   VERIFY( status == std::future_status::ready );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::ready );
+}
+
+// This test is prone to failures if run on a loaded machine where the
+// kernel decides not to schedule us for several seconds. It also
+// assumes that no-one will warp CLOCK whilst the test is
+// running when CLOCK is std::chrono::system_clock.
+template<typename CLOCK>
+void test03()
+{
+  auto const start = CLOCK::now();
+  future<void> f1 = async(launch::async, []() {
+      std::this_thread::sleep_for(std::chrono::seconds(2));
+    });
+  std::future_status status;
+
+  status = f1.wait_for(std::chrono::milliseconds(500));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(1));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(5));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = CLOCK::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(2) );
+  VERIFY( elapsed < std::chrono::seconds(5) );
 }
 
 int main()
 {
   test01();
   test02();
+  test03<std::chrono::system_clock>();
+  test03<std::chrono::steady_clock>();
   return 0;
 }
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 1/8] libstdc++: Improve async test Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-11-12 23:07   ` Jonathan Wakely
  2020-05-29  6:17 ` [PATCH v5 3/8] libstdc++ futex: Support waiting on std::chrono::steady_clock directly Mike Crowe
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

The futex system call supports waiting for an absolute time if
FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
benefits:

1. The call to gettimeofday is not required in order to calculate a
   relative timeout.

2. If someone changes the system clock during the wait then the futex
   timeout will correctly expire earlier or later.  Currently that only
   happens if the clock is changed prior to the call to gettimeofday.

According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
that the code still works correctly with earlier kernel versions, an ENOSYS
error from futex[1] results in the futex_clock_realtime_unavailable flag
being set.  This flag is used to avoid the unnecessary unsupported futex
call in the future and to fall back to the previous gettimeofday and
relative time implementation.

glibc applied an equivalent switch in pthread_cond_timedwait to use
FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
glibc-2.10 back in 2009.  See
glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7

The futex_clock_realtime_unavailable flag is accessed using
std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
two calls to _M_futex_wait_until happen to happen simultaneously then the
only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
risk discovering that it doesn't work and, if so, both set the flag.

[1] This is how glibc's nptl-init.c determines whether these flags are
    supported.

	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
	futex flags.  Add futex_clock_realtime_unavailable flag to store
	result of trying to use
	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
	fall back to using gettimeofday and FUTEX_WAIT if that's not
	supported.
---
 libstdc++-v3/src/c++11/futex.cc | 37 ++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+)

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index c9de11a..25b3e05 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -35,8 +35,16 @@
 
 // Constants for the wait/wake futex syscall operations
 const unsigned futex_wait_op = 0;
+const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_realtime_flag = 256;
+const unsigned futex_bitset_match_any = ~0;
 const unsigned futex_wake_op = 1;
 
+namespace
+{
+  std::atomic<bool> futex_clock_realtime_unavailable;
+}
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -58,6 +66,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       }
     else
       {
+	if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
+	  {
+	    struct timespec rt;
+	    rt.tv_sec = __s.count();
+	    rt.tv_nsec = __ns.count();
+	    if (syscall (SYS_futex, __addr,
+			 futex_wait_bitset_op | futex_clock_realtime_flag,
+			 __val, &rt, nullptr, futex_bitset_match_any) == -1)
+	      {
+		__glibcxx_assert(errno == EINTR || errno == EAGAIN
+				|| errno == ETIMEDOUT || errno == ENOSYS);
+		if (errno == ETIMEDOUT)
+		  return false;
+		if (errno == ENOSYS)
+		  {
+		    futex_clock_realtime_unavailable.store(true,
+						    std::memory_order_relaxed);
+		    // Fall through to legacy implementation if the system
+		    // call is unavailable.
+		  }
+		else
+		  return true;
+	      }
+	    else
+	      return true;
+	  }
+
+	// We only get to here if futex_clock_realtime_unavailable was
+	// true or has just been set to true.
 	struct timeval tv;
 	gettimeofday (&tv, NULL);
 	// Convert the absolute timeout value to a relative timeout
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 3/8] libstdc++ futex: Support waiting on std::chrono::steady_clock directly
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 1/8] libstdc++: Improve async test Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 4/8] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock Mike Crowe
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

The user-visible effect of this change is for std::future::wait_until to
use CLOCK_MONOTONIC when passed a timeout of std::chrono::steady_clock
type.  This makes it immune to any changes made to the system clock
CLOCK_REALTIME.

Add an overload of __atomic_futex_unsigned::_M_load_and_text_until_impl
that accepts a std::chrono::steady_clock, and correctly passes this through
to __atomic_futex_unsigned_base::_M_futex_wait_until_steady which uses
CLOCK_MONOTONIC for the timeout within the futex system call.  These
functions are mostly just copies of the std::chrono::system_clock versions
with small tweaks.

Prior to this commit, a std::chrono::steady timeout would be converted via
std::chrono::system_clock which risks reducing or increasing the timeout if
someone changes CLOCK_REALTIME whilst the wait is happening.  (The commit
immediately prior to this one increases the window of opportunity for that
from a short period during the calculation of a relative timeout, to the
entire duration of the wait.)

FUTEX_WAIT_BITSET was added in kernel v2.6.25.  If futex reports ENOSYS to
indicate that this operation is not supported then the code falls back to
using clock_gettime(2) to calculate a relative time to wait for.

I believe that I've added this functionality in a way that it doesn't break
ABI compatibility, but that has made it more verbose and less type safe.  I
believe that it would be better to maintain the timeout as an instance of
the correct clock type all the way down to a single _M_futex_wait_until
function with an overload for each clock.  The current scheme of separating
out the seconds and nanoseconds early risks accidentally calling the wait
function for the wrong clock.  Unfortunately, doing this would break code
that compiled against the old header.

	* libstdc++-v3/config/abi/pre/gnu.ver: Update for addition of
          __atomic_futex_unsigned_base::_M_futex_wait_until_steady.

	* libstdc++-v3/include/bits/atomic_futex.h
          (__atomic_futex_unsigned_base): Add comments to clarify that
          _M_futex_wait_until _M_load_and_test_until use CLOCK_REALTIME.
          Declare new _M_futex_wait_until_steady and
          _M_load_and_text_until_steady methods that use CLOCK_MONOTONIC.
          Add _M_load_and_test_until_impl and _M_load_when_equal_until
          overloads that accept a steady_clock time_point and use these new
          methods.

	* libstdc++-v3/src/c++11/futex.cc: Include headers required for
	clock_gettime. Add futex_clock_monotonic_flag constant to tell
	futex to use CLOCK_MONOTONIC to match the existing
	futex_clock_realtime_flag.  Add futex_clock_monotonic_unavailable
	to store the result of trying to use
	CLOCK_MONOTONIC. (__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
	Add new variant of _M_futex_wait_until that uses CLOCK_MONOTONIC to
	support waiting using steady_clock.
---
 libstdc++-v3/config/abi/pre/gnu.ver      | 10 +--
 libstdc++-v3/include/bits/atomic_futex.h | 67 +++++++++++++++++++-
 libstdc++-v3/src/c++11/futex.cc          | 82 +++++++++++++++++++++++++-
 3 files changed, 154 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index edf4485..3d734d7 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1916,10 +1916,9 @@ GLIBCXX_3.4.21 {
     _ZNSt7codecvtID[is]c*;
     _ZT[ISV]St7codecvtID[is]c*E;
 
-    extern "C++"
-    {
-      std::__atomic_futex_unsigned_base*;
-    };
+    # std::__atomic_futex_unsigned_base members
+    _ZNSt28__atomic_futex_unsigned_base19_M_futex_notify_all*;
+    _ZNSt28__atomic_futex_unsigned_base19_M_futex_wait_until*;
 
     # codecvt_utf8 etc.
     _ZNKSt19__codecvt_utf8_base*;
@@ -2297,6 +2296,9 @@ GLIBCXX_3.4.28 {
     _ZNSt3pmr25monotonic_buffer_resourceD[0125]Ev;
     _ZT[ISV]NSt3pmr25monotonic_buffer_resourceE;
 
+    # std::__atomic_futex_unsigned_base::_M_futex_wait_until_steady
+    _ZNSt28__atomic_futex_unsigned_base26_M_futex_wait_until_steady*;
+
 } GLIBCXX_3.4.27;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
index 886fc63..507c5c9 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -52,11 +52,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if defined(_GLIBCXX_HAVE_LINUX_FUTEX) && ATOMIC_INT_LOCK_FREE > 1
   struct __atomic_futex_unsigned_base
   {
-    // Returns false iff a timeout occurred.
+    // __s and __ns are measured against CLOCK_REALTIME. Returns false
+    // iff a timeout occurred.
     bool
     _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout,
 	chrono::seconds __s, chrono::nanoseconds __ns);
 
+    // __s and __ns are measured against CLOCK_MONOTONIC. Returns
+    // false iff a timeout occurred.
+    bool
+    _M_futex_wait_until_steady(unsigned *__addr, unsigned __val,
+	bool __has_timeout, chrono::seconds __s, chrono::nanoseconds __ns);
+
     // This can be executed after the object has been destroyed.
     static void _M_futex_notify_all(unsigned* __addr);
   };
@@ -86,6 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     // value if equal is false.
     // The assumed value is the caller's assumption about the current value
     // when making the call.
+    // __s and __ns are measured against CLOCK_REALTIME.
     unsigned
     _M_load_and_test_until(unsigned __assumed, unsigned __operand,
 	bool __equal, memory_order __mo, bool __has_timeout,
@@ -110,6 +118,36 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
     }
 
+    // If a timeout occurs, returns a current value after the timeout;
+    // otherwise, returns the operand's value if equal is true or a different
+    // value if equal is false.
+    // The assumed value is the caller's assumption about the current value
+    // when making the call.
+    // __s and __ns are measured against CLOCK_MONOTONIC.
+    unsigned
+    _M_load_and_test_until_steady(unsigned __assumed, unsigned __operand,
+	bool __equal, memory_order __mo, bool __has_timeout,
+	chrono::seconds __s, chrono::nanoseconds __ns)
+    {
+      for (;;)
+	{
+	  // Don't bother checking the value again because we expect the caller
+	  // to have done it recently.
+	  // memory_order_relaxed is sufficient because we can rely on just the
+	  // modification order (store_notify uses an atomic RMW operation too),
+	  // and the futex syscalls synchronize between themselves.
+	  _M_data.fetch_or(_Waiter_bit, memory_order_relaxed);
+	  bool __ret = _M_futex_wait_until_steady((unsigned*)(void*)&_M_data,
+					   __assumed | _Waiter_bit,
+					   __has_timeout, __s, __ns);
+	  // Fetch the current value after waiting (clears _Waiter_bit).
+	  __assumed = _M_load(__mo);
+	  if (!__ret || ((__operand == __assumed) == __equal))
+	    return __assumed;
+	  // TODO adapt wait time
+	}
+    }
+
     // Returns the operand's value if equal is true or a different value if
     // equal is false.
     // The assumed value is the caller's assumption about the current value
@@ -140,6 +178,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  true, __s.time_since_epoch(), __ns);
     }
 
+    template<typename _Dur>
+    unsigned
+    _M_load_and_test_until_impl(unsigned __assumed, unsigned __operand,
+	bool __equal, memory_order __mo,
+	const chrono::time_point<std::chrono::steady_clock, _Dur>& __atime)
+    {
+      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
+      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
+      // XXX correct?
+      return _M_load_and_test_until_steady(__assumed, __operand, __equal, __mo,
+	  true, __s.time_since_epoch(), __ns);
+    }
+
   public:
 
     _GLIBCXX_ALWAYS_INLINE unsigned
@@ -200,6 +251,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       return (__i & ~_Waiter_bit) == __val;
     }
 
+    // Returns false iff a timeout occurred.
+    template<typename _Duration>
+    _GLIBCXX_ALWAYS_INLINE bool
+    _M_load_when_equal_until(unsigned __val, memory_order __mo,
+	const chrono::time_point<std::chrono::steady_clock, _Duration>& __atime)
+    {
+      unsigned __i = _M_load(__mo);
+      if ((__i & ~_Waiter_bit) == __val)
+	return true;
+      // TODO Spin-wait first.  Ignore effect on timeout.
+      __i = _M_load_and_test_until_impl(__i, __val, true, __mo, __atime);
+      return (__i & ~_Waiter_bit) == __val;
+    }
+
     _GLIBCXX_ALWAYS_INLINE void
     _M_store_notify_all(unsigned __val, memory_order __mo)
     {
diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 25b3e05..0331bd6 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -33,9 +33,15 @@
 #include <errno.h>
 #include <debug/debug.h>
 
+#ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
+#include <unistd.h>
+#include <sys/syscall.h>
+#endif
+
 // Constants for the wait/wake futex syscall operations
 const unsigned futex_wait_op = 0;
 const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_monotonic_flag = 0;
 const unsigned futex_clock_realtime_flag = 256;
 const unsigned futex_bitset_match_any = ~0;
 const unsigned futex_wake_op = 1;
@@ -43,6 +49,7 @@ const unsigned futex_wake_op = 1;
 namespace
 {
   std::atomic<bool> futex_clock_realtime_unavailable;
+  std::atomic<bool> futex_clock_monotonic_unavailable;
 }
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -121,6 +128,81 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       }
   }
 
+  bool
+  __atomic_futex_unsigned_base::_M_futex_wait_until_steady(unsigned *__addr,
+      unsigned __val,
+      bool __has_timeout, chrono::seconds __s, chrono::nanoseconds __ns)
+  {
+    if (!__has_timeout)
+      {
+	// Ignore whether we actually succeeded to block because at worst,
+	// we will fall back to spin-waiting.  The only thing we could do
+	// here on errors is abort.
+	int ret __attribute__((unused));
+	ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr);
+	__glibcxx_assert(ret == 0 || errno == EINTR || errno == EAGAIN);
+	return true;
+      }
+    else
+      {
+	if (!futex_clock_monotonic_unavailable.load(std::memory_order_relaxed))
+	  {
+	    struct timespec rt;
+	    rt.tv_sec = __s.count();
+	    rt.tv_nsec = __ns.count();
+
+	    if (syscall (SYS_futex, __addr,
+			 futex_wait_bitset_op | futex_clock_monotonic_flag,
+			 __val, &rt, nullptr, futex_bitset_match_any) == -1)
+	      {
+		__glibcxx_assert(errno == EINTR || errno == EAGAIN
+				 || errno == ETIMEDOUT || errno == ENOSYS);
+		if (errno == ETIMEDOUT)
+		  return false;
+		else if (errno == ENOSYS)
+		  {
+		    futex_clock_monotonic_unavailable.store(true,
+						    std::memory_order_relaxed);
+		    // Fall through to legacy implementation if the system
+		    // call is unavailable.
+		  }
+		else
+		  return true;
+	      }
+	  }
+
+	// We only get to here if futex_clock_monotonic_unavailable was
+	// true or has just been set to true.
+	struct timespec ts;
+#ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
+	syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &ts);
+#else
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+#endif
+	// Convert the absolute timeout value to a relative timeout
+	struct timespec rt;
+	rt.tv_sec = __s.count() - ts.tv_sec;
+	rt.tv_nsec = __ns.count() - ts.tv_nsec;
+	if (rt.tv_nsec < 0)
+	  {
+	    rt.tv_nsec += 1000000000;
+	    --rt.tv_sec;
+	  }
+	// Did we already time out?
+	if (rt.tv_sec < 0)
+	  return false;
+
+	if (syscall (SYS_futex, __addr, futex_wait_op, __val, &rt) == -1)
+	  {
+	    __glibcxx_assert(errno == EINTR || errno == EAGAIN
+			     || errno == ETIMEDOUT);
+	    if (errno == ETIMEDOUT)
+	      return false;
+	  }
+	return true;
+      }
+  }
+
   void
   __atomic_futex_unsigned_base::_M_futex_notify_all(unsigned* __addr)
   {
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 4/8] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (2 preceding siblings ...)
  2020-05-29  6:17 ` [PATCH v5 3/8] libstdc++ futex: Support waiting on std::chrono::steady_clock directly Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock Mike Crowe
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

The user-visible effect of this change is that std::future::wait_for now
uses std::chrono::steady_clock to determine the timeout.  This makes it
immune to changes made to the system clock.  It also means that anyone
using their own clock types with std::future::wait_until will have the
timeout converted to std::chrono::steady_clock rather than
std::chrono::system_clock.

Now that use of both std::chrono::steady_clock and
std::chrono::system_clock are correctly supported for the wait timeout, I
believe that std::chrono::steady_clock is a better choice for the reference
clock that all other clocks are converted to since it is guaranteed to
advance steadily.  The previous behaviour of converting to
std::chrono::system_clock risks timeouts changing dramatically when the
system clock is changed.

	* libstdc++-v3/include/bits/atomic_futex.h:
	(__atomic_futex_unsigned): Change __clock_t typedef to use
	steady_clock so that unknown clocks are synced to it rather than
	system_clock. Change existing __clock_t overloads of
	_M_load_and_text_until_impl and _M_load_when_equal_until to use
	system_clock explicitly. Remove comment about DR 887 since these
	changes address that problem as best as we currently able.
---
 libstdc++-v3/include/bits/atomic_futex.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
index 507c5c9..4375129 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template <unsigned _Waiter_bit = 0x80000000>
   class __atomic_futex_unsigned : __atomic_futex_unsigned_base
   {
-    typedef chrono::system_clock __clock_t;
+    typedef chrono::steady_clock __clock_t;
 
     // This must be lock-free and at offset 0.
     atomic<unsigned> _M_data;
@@ -169,7 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     unsigned
     _M_load_and_test_until_impl(unsigned __assumed, unsigned __operand,
 	bool __equal, memory_order __mo,
-	const chrono::time_point<__clock_t, _Dur>& __atime)
+	const chrono::time_point<std::chrono::system_clock, _Dur>& __atime)
     {
       auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
       auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
@@ -229,7 +229,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_load_when_equal_until(unsigned __val, memory_order __mo,
 	  const chrono::time_point<_Clock, _Duration>& __atime)
       {
-	// DR 887 - Sync unknown clock to known clock.
 	const typename _Clock::time_point __c_entry = _Clock::now();
 	const __clock_t::time_point __s_entry = __clock_t::now();
 	const auto __delta = __atime - __c_entry;
@@ -241,7 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     template<typename _Duration>
     _GLIBCXX_ALWAYS_INLINE bool
     _M_load_when_equal_until(unsigned __val, memory_order __mo,
-	const chrono::time_point<__clock_t, _Duration>& __atime)
+	const chrono::time_point<std::chrono::system_clock, _Duration>& __atime)
     {
       unsigned __i = _M_load(__mo);
       if ((__i & ~_Waiter_bit) == __val)
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (3 preceding siblings ...)
  2020-05-29  6:17 ` [PATCH v5 4/8] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-09-11  9:47   ` Jonathan Wakely
  2020-05-29  6:17 ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

If std::future::wait_until is passed a time point measured against a clock
that is neither std::chrono::steady_clock nor std::chrono::system_clock
then the generic implementation of
__atomic_futex_unsigned::_M_load_when_equal_until is called which
calculates the timeout based on __clock_t and calls the
_M_load_when_equal_until method for that clock to perform the actual wait.

There's no guarantee that __clock_t is running at the same speed as the
caller's clock, so if the underlying wait times out timeout we need to
check the timeout against the caller's clock again before potentially
looping.

Also add two extra tests to the testsuite's async.cc:

* run test03 with steady_clock_copy, which behaves identically to
  std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
  causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
  that takes an arbitrary clock to be called.

* invent test04 which uses a deliberately slow running clock in order to
  exercise the looping behaviour o
  __atomic_futex_unsigned::_M_load_when_equal_until described above.

	* libstdc++-v3/include/bits/atomic_futex.h:
	(__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
	generic _Clock to check the timeout against _Clock again after
	_M_load_when_equal_until returns indicating a timeout.

	* libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
	slow_clock that runs at an eleventh of steady_clock's speed. Use it
	to test the user-supplied-clock variant of
	__atomic_futex_unsigned::_M_load_when_equal_until works generally
	with test03 and loops correctly when the timeout time hasn't been
	reached in test04.
---
 libstdc++-v3/include/bits/atomic_futex.h         | 15 ++--
 libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +++++++++++++++++-
 2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
index 4375129..5f95ade 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_load_when_equal_until(unsigned __val, memory_order __mo,
 	  const chrono::time_point<_Clock, _Duration>& __atime)
       {
-	const typename _Clock::time_point __c_entry = _Clock::now();
-	const __clock_t::time_point __s_entry = __clock_t::now();
-	const auto __delta = __atime - __c_entry;
-	const auto __s_atime = __s_entry + __delta;
-	return _M_load_when_equal_until(__val, __mo, __s_atime);
+	typename _Clock::time_point __c_entry = _Clock::now();
+	do {
+	  const __clock_t::time_point __s_entry = __clock_t::now();
+	  const auto __delta = __atime - __c_entry;
+	  const auto __s_atime = __s_entry + __delta;
+	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
+	    return true;
+	  __c_entry = _Clock::now();
+	} while (__c_entry < __atime);
+	return false;
       }
 
     // Returns false iff a timeout occurred.
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 84d94cf..ee117f4 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -63,6 +63,24 @@ void test02()
   VERIFY( status == std::future_status::ready );
 }
 
+// This clock behaves exactly the same as steady_clock, but it is not
+// steady_clock which means that the generic clock overload of
+// future::wait_until is used.
+struct steady_clock_copy
+{
+  using rep = std::chrono::steady_clock::rep;
+  using period = std::chrono::steady_clock::period;
+  using duration = std::chrono::steady_clock::duration;
+  using time_point = std::chrono::time_point<steady_clock_copy, duration>;
+  static constexpr bool is_steady = true;
+
+  static time_point now()
+  {
+    const auto steady = std::chrono::steady_clock::now();
+    return time_point{steady.time_since_epoch()};
+  }
+};
+
 // This test is prone to failures if run on a loaded machine where the
 // kernel decides not to schedule us for several seconds. It also
 // assumes that no-one will warp CLOCK whilst the test is
@@ -90,11 +108,63 @@ void test03()
   VERIFY( elapsed < std::chrono::seconds(5) );
 }
 
+// This clock is supposed to run at a tenth of normal speed, but we
+// don't have to worry about rounding errors causing us to wake up
+// slightly too early below if we actually run it at an eleventh of
+// normal speed. It is used to exercise the
+// __atomic_futex_unsigned::_M_load_when_equal_until overload that
+// takes an arbitrary clock.
+struct slow_clock
+{
+  using rep = std::chrono::steady_clock::rep;
+  using period = std::chrono::steady_clock::period;
+  using duration = std::chrono::steady_clock::duration;
+  using time_point = std::chrono::time_point<slow_clock, duration>;
+  static constexpr bool is_steady = true;
+
+  static time_point now()
+  {
+    const auto steady = std::chrono::steady_clock::now();
+    return time_point{steady.time_since_epoch() / 11};
+  }
+};
+
+void test04()
+{
+  using namespace std::chrono;
+
+  auto const slow_start = slow_clock::now();
+  future<void> f1 = async(launch::async, []() {
+      std::this_thread::sleep_for(std::chrono::seconds(2));
+    });
+
+  // Wait for ~1s
+  {
+    auto const steady_begin = steady_clock::now();
+    auto const status = f1.wait_until(slow_start + milliseconds(100));
+    VERIFY(status == std::future_status::timeout);
+    auto const elapsed = steady_clock::now() - steady_begin;
+    VERIFY(elapsed >= seconds(1));
+    VERIFY(elapsed < seconds(2));
+  }
+
+  // Wait for up to ~2s more
+  {
+    auto const steady_begin = steady_clock::now();
+    auto const status = f1.wait_until(slow_start + milliseconds(300));
+    VERIFY(status == std::future_status::ready);
+    auto const elapsed = steady_clock::now() - steady_begin;
+    VERIFY(elapsed < seconds(2));
+  }
+}
+
 int main()
 {
   test01();
   test02();
   test03<std::chrono::system_clock>();
   test03<std::chrono::steady_clock>();
+  test03<steady_clock_copy>();
+  test04();
   return 0;
 }
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (4 preceding siblings ...)
  2020-05-29  6:17 ` [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-09-11 14:40   ` Jonathan Wakely
  2020-05-29  6:17 ` [PATCH v5 7/8] libstdc++ condition_variable: Avoid rounding errors on custom clocks Mike Crowe
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

Convert the specified duration to the target clock's duration type
before adding it to the current time in
__atomic_futex_unsigned::_M_load_when_equal_for and
_M_load_when_equal_until.  This removes the risk of the timeout being
rounded down to the current time resulting in there being no wait at
all when the duration type lacks sufficient precision to hold the
steady_clock current time.

Rather than using the style of fix from PR68519, let's expose the C++17
std::chrono::ceil function as std::chrono::__detail::ceil so that it can
be used in code compiled with earlier standards versions and simplify
the fix. This was suggested by John Salmon in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .

This problem has become considerably less likely to trigger since I
switched the __atomic__futex_unsigned::__clock_t reference clock from
system_clock to steady_clock and added the loop, but the consequences of
triggering it have changed too.

By my calculations it takes just over 194 days from the epoch for the
current time not to be representable in a float. This means that
system_clock is always subject to the problem (with the standard 1970
epoch) whereas steady_clock with float duration only runs out of
resolution machine has been running for that long (assuming the Linux
implementation of CLOCK_MONOTONIC.)

The recently-added loop in
__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
into a busy wait.

Unfortunately the combination of both of these things means that it's
not possible to write a test case for this occurring in
_M_load_when_equal_until as it stands.

	* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
          implementation of std::chrono::ceil into private namespace so
          that it's available to pre-C++17 code.

	* libstdc++-v3/include/bits/atomic_futex.h:
	  (__atomic_futex_unsigned::_M_load_when_equal_for,
	  __atomic_futex_unsigned::_M_load_when_equal_until): Use
	  __detail::ceil to convert delta to the reference clock
	  duration type to avoid resolution problems

	* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
          New test for __atomic_futex_unsigned::_M_load_when_equal_for.
---
 libstdc++-v3/include/bits/atomic_futex.h         |  6 +++--
 libstdc++-v3/include/std/chrono                  | 19 +++++++++++++----
 libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +++++++++++++-
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
index 5f95ade..aa137a7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_load_when_equal_for(unsigned __val, memory_order __mo,
 	  const chrono::duration<_Rep, _Period>& __rtime)
       {
+	using __dur = typename __clock_t::duration;
 	return _M_load_when_equal_until(__val, __mo,
-					__clock_t::now() + __rtime);
+		    __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
       }
 
     // Returns false iff a timeout occurred.
@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	do {
 	  const __clock_t::time_point __s_entry = __clock_t::now();
 	  const auto __delta = __atime - __c_entry;
-	  const auto __s_atime = __s_entry + __delta;
+	  const auto __s_atime = __s_entry +
+	      chrono::__detail::ceil<_Duration>(__delta);
 	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
 	    return true;
 	  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 6d78f32..4257c7c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 #endif // C++20
 
+    // We want to use ceil even when compiling for earlier standards versions
+    namespace __detail
+    {
+      template<typename _ToDur, typename _Rep, typename _Period>
+        constexpr __enable_if_is_duration<_ToDur>
+        ceil(const duration<_Rep, _Period>& __d)
+        {
+	  auto __to = chrono::duration_cast<_ToDur>(__d);
+	  if (__to < __d)
+	    return __to + _ToDur{1};
+	  return __to;
+	}
+    }
+
 #if __cplusplus >= 201703L
 # define __cpp_lib_chrono 201611
 
@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       constexpr __enable_if_is_duration<_ToDur>
       ceil(const duration<_Rep, _Period>& __d)
       {
-	auto __to = chrono::duration_cast<_ToDur>(__d);
-	if (__to < __d)
-	  return __to + _ToDur{1};
-	return __to;
+	return __detail::ceil<_ToDur>(__d);
       }
 
     template <typename _ToDur, typename _Rep, typename _Period>
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index ee117f4..f697292 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -158,6 +158,20 @@ void test04()
   }
 }
 
+void test_pr91486()
+{
+  future<void> f1 = async(launch::async, []() {
+      std::this_thread::sleep_for(std::chrono::seconds(1));
+    });
+
+  std::chrono::duration<float> const wait_time = std::chrono::seconds(1);
+  auto const start_steady = chrono::steady_clock::now();
+  auto status = f1.wait_for(wait_time);
+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
+
+  VERIFY( elapsed_steady >= std::chrono::seconds(1) );
+}
+
 int main()
 {
   test01();
@@ -166,5 +180,6 @@ int main()
   test03<std::chrono::steady_clock>();
   test03<steady_clock_copy>();
   test04();
+  test_pr91486();
   return 0;
 }
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 7/8] libstdc++ condition_variable: Avoid rounding errors on custom clocks
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (5 preceding siblings ...)
  2020-05-29  6:17 ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-05-29  6:17 ` [PATCH v5 8/8] libstdc++: Extra async tests, not for merging Mike Crowe
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

The fix for PR68519 in 83fd5e73b3c16296e0d7ba54f6c547e01c7eae7b only
applied to condition_variable::wait_for. This problem can also apply to
condition_variable::wait_until but only if the custom clock is using a
more recent epoch so that a small enough delta can be calculated. let's
use the newly-added chrono::__detail::ceil to fix this and also make use
of that function to simplify the previous wait_for fixes.

Also, simplify the existing test case for PR68519 a little and make its
variables local so we can add a new test case for the above
problem. Unfortunately, the test would have only started failing if
sufficient time has passed since the chrono::steady_clock epoch had
passed anyway, but it's better than nothing.

	* libstdc++-v3/include/std/condition_variable:
          (condition_variable::wait_until): Convert delta to
          steady_clock duration before adding to current steady_clock
          time to avoid rounding errors described in
          PR68519. (condition_variable::wait_for): Simplify calculation
          of absolute time by using chrono::__detail::ceil in both
          overloads.  *
          libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc:
          (test_wait_for): Renamed from test01. Replace unassigned val
          variable with constant false. Reduce scope of mx and cv
          variables to just test_wait_for function. (test_wait_until):
          Add new test case.
---
 libstdc++-v3/include/std/condition_variable                           | 18 +++++++++---------
 libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/std/condition_variable b/libstdc++-v3/include/std/condition_variable
index 2db9dff..0796ca9 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -133,10 +133,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus > 201703L
 	static_assert(chrono::is_clock_v<_Clock>);
 #endif
+	using __s_dur = typename __clock_t::duration;
 	const typename _Clock::time_point __c_entry = _Clock::now();
 	const __clock_t::time_point __s_entry = __clock_t::now();
 	const auto __delta = __atime - __c_entry;
-	const auto __s_atime = __s_entry + __delta;
+	const auto __s_atime = __s_entry +
+	  chrono::__detail::ceil<__s_dur>(__delta);
 
 	if (__wait_until_impl(__lock, __s_atime) == cv_status::no_timeout)
 	  return cv_status::no_timeout;
@@ -166,10 +168,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	       const chrono::duration<_Rep, _Period>& __rtime)
       {
 	using __dur = typename steady_clock::duration;
-	auto __reltime = chrono::duration_cast<__dur>(__rtime);
-	if (__reltime < __rtime)
-	  ++__reltime;
-	return wait_until(__lock, steady_clock::now() + __reltime);
+	return wait_until(__lock,
+			  steady_clock::now() +
+			  chrono::__detail::ceil<__dur>(__rtime));
       }
 
     template<typename _Rep, typename _Period, typename _Predicate>
@@ -179,10 +180,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	       _Predicate __p)
       {
 	using __dur = typename steady_clock::duration;
-	auto __reltime = chrono::duration_cast<__dur>(__rtime);
-	if (__reltime < __rtime)
-	  ++__reltime;
-	return wait_until(__lock, steady_clock::now() + __reltime,
+	return wait_until(__lock,
+			  steady_clock::now() +
+			  chrono::__detail::ceil<__dur>(__rtime),
 			  std::move(__p));
       }
 
diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc
index 9a70713..739f74c 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/68519.cc
@@ -26,25 +26,72 @@
 
 // PR libstdc++/68519
 
-bool val = false;
-std::mutex mx;
-std::condition_variable cv;
-
 void
-test01()
+test_wait_for()
 {
+  std::mutex mx;
+  std::condition_variable cv;
+
   for (int i = 0; i < 3; ++i)
   {
     std::unique_lock<std::mutex> l(mx);
     auto start = std::chrono::system_clock::now();
-    cv.wait_for(l, std::chrono::duration<float>(1), [] { return val; });
+    cv.wait_for(l, std::chrono::duration<float>(1), [] { return false; });
     auto t = std::chrono::system_clock::now();
     VERIFY( (t - start) >= std::chrono::seconds(1) );
   }
 }
 
+// In order to ensure that the delta calculated in the arbitrary clock overload
+// of condition_variable::wait_until fits accurately in a float, but the result
+// of adding it to steady_clock with a float duration does not, this clock
+// needs to use a more recent epoch.
+struct recent_epoch_float_clock
+{
+  using rep = std::chrono::duration<float>::rep;
+  using period = std::chrono::duration<float>::period;
+  using time_point = std::chrono::time_point<recent_epoch_float_clock,
+    std::chrono::duration<float>>;
+  static constexpr bool is_steady = true;
+
+  static const std::chrono::steady_clock::time_point epoch;
+
+  static time_point now()
+  {
+    const auto steady = std::chrono::steady_clock::now();
+    return time_point{steady - epoch};
+  }
+};
+
+const std::chrono::steady_clock::time_point recent_epoch_float_clock::epoch =
+  std::chrono::steady_clock::now();
+
+void
+test_wait_until()
+{
+  using clock = recent_epoch_float_clock;
+
+  std::mutex mx;
+  std::condition_variable cv;
+
+  for (int i = 0; i < 3; ++i)
+  {
+    std::unique_lock<std::mutex> l(mx);
+    const auto start = clock::now();
+    const auto wait_time = start + std::chrono::duration<float>{1.0};
+
+    // In theory we could get a spurious wakeup, but in practice we won't.
+    const auto result = cv.wait_until(l, wait_time);
+
+    VERIFY( result == std::cv_status::timeout );
+    const auto elapsed = clock::now() - start;
+    VERIFY( elapsed >= std::chrono::seconds(1) );
+  }
+}
+
 int
 main()
 {
-  test01();
+  test_wait_for();
+  test_wait_until();
 }
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 8/8] libstdc++: Extra async tests, not for merging
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (6 preceding siblings ...)
  2020-05-29  6:17 ` [PATCH v5 7/8] libstdc++ condition_variable: Avoid rounding errors on custom clocks Mike Crowe
@ 2020-05-29  6:17 ` Mike Crowe
  2020-07-17  8:39 ` [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
  2020-09-11 13:26 ` Jonathan Wakely
  9 siblings, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-05-29  6:17 UTC (permalink / raw)
  To: libstdc++, gcc-patches

These tests show that changing the system clock has an effect on
std::future::wait_until when using std::chrono::system_clock but not when
using std::chrono::steady_clock.  Unfortunately these tests have a number
of downsides:

1. Nothing that is attempting to keep the clock set correctly (ntpd,
   systemd-timesyncd) can be running at the same time.

2. The test process requires the CAP_SYS_TIME capability (although, as it's
   written it checks for being root.)

3. Other processes running concurrently may misbehave when the clock darts
   back and forth.

4. They are slow to run.

As such, I don't think they are suitable for merging. I include them here
because I wanted to document how I had tested the changes in the previous
commits.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +++++++++++++++++-
 1 file changed, 70 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index f697292..8b44810 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -24,6 +24,7 @@
 
 #include <future>
 #include <testsuite_hooks.h>
+#include <sys/time.h>
 
 using namespace std;
 
@@ -172,6 +173,71 @@ void test_pr91486()
   VERIFY( elapsed_steady >= std::chrono::seconds(1) );
 }
 
+void perturb_system_clock(const std::chrono::seconds &seconds)
+{
+  struct timeval tv;
+  if (gettimeofday(&tv, NULL))
+    abort();
+
+  tv.tv_sec += seconds.count();
+  if (settimeofday(&tv, NULL))
+    abort();
+}
+
+// Ensure that advancing CLOCK_REALTIME doesn't make any difference
+// when we're waiting on std::chrono::steady_clock.
+void test05()
+{
+  auto const start = chrono::steady_clock::now();
+  future<void> f1 = async(launch::async, []() {
+      std::this_thread::sleep_for(std::chrono::seconds(10));
+    });
+
+  perturb_system_clock(chrono::seconds(20));
+
+  std::future_status status;
+  status = f1.wait_for(std::chrono::seconds(4));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(6));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(12));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = chrono::steady_clock::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(10) );
+  VERIFY( elapsed < std::chrono::seconds(15) );
+
+  perturb_system_clock(chrono::seconds(-20));
+}
+
+// Ensure that advancing CLOCK_REALTIME does make a difference when
+// we're waiting on std::chrono::system_clock.
+void test06()
+{
+  auto const start = chrono::system_clock::now();
+  auto const start_steady = chrono::steady_clock::now();
+
+  future<void> f1 = async(launch::async, []() {
+      std::this_thread::sleep_for(std::chrono::seconds(5));
+      perturb_system_clock(chrono::seconds(60));
+      std::this_thread::sleep_for(std::chrono::seconds(5));
+    });
+  future_status status;
+  status = f1.wait_until(start + std::chrono::seconds(60));
+  VERIFY( status == std::future_status::timeout );
+
+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
+  VERIFY( elapsed_steady >= std::chrono::seconds(5) );
+  VERIFY( elapsed_steady < std::chrono::seconds(10) );
+
+  status = f1.wait_until(start + std::chrono::seconds(75));
+  VERIFY( status == std::future_status::ready );
+
+  perturb_system_clock(chrono::seconds(-60));
+}
+
 int main()
 {
   test01();
@@ -181,5 +247,9 @@ int main()
   test03<steady_clock_copy>();
   test04();
   test_pr91486();
+  if (geteuid() == 0) {
+    test05();
+    test06();
+  }
   return 0;
 }
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (7 preceding siblings ...)
  2020-05-29  6:17 ` [PATCH v5 8/8] libstdc++: Extra async tests, not for merging Mike Crowe
@ 2020-07-17  8:39 ` Mike Crowe
  2020-07-17  9:34   ` Jonathan Wakely
  2020-09-11 13:26 ` Jonathan Wakely
  9 siblings, 1 reply; 25+ messages in thread
From: Mike Crowe @ 2020-07-17  8:39 UTC (permalink / raw)
  To: libstdc++, gcc-patches

On Friday 29 May 2020 at 07:17:26 +0100, Mike Crowe wrote:
> This series ensures that the std::future::wait_* functions use
> std::chrono::steady_clock when required, introduces
> std::chrono::__detail::ceil to make that easier to do, and then makes
> use of that function to simplify and improve the fix for PR68519 in
> std::condition_variable.

[snip]

Ping. (Complete series at
https://gcc.gnu.org/pipermail/libstdc++/2020-May/050434.html )

Thanks.

Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements
  2020-07-17  8:39 ` [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
@ 2020-07-17  9:34   ` Jonathan Wakely
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Wakely @ 2020-07-17  9:34 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 17/07/20 09:39 +0100, Mike Crowe via Libstdc++ wrote:
>On Friday 29 May 2020 at 07:17:26 +0100, Mike Crowe wrote:
>> This series ensures that the std::future::wait_* functions use
>> std::chrono::steady_clock when required, introduces
>> std::chrono::__detail::ceil to make that easier to do, and then makes
>> use of that function to simplify and improve the fix for PR68519 in
>> std::condition_variable.
>
>[snip]
>
>Ping. (Complete series at
>https://gcc.gnu.org/pipermail/libstdc++/2020-May/050434.html )

Yep, it's in the (very long) review queue.

It all looks fine from what I've looked at so far.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock
  2020-05-29  6:17 ` [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock Mike Crowe
@ 2020-09-11  9:47   ` Jonathan Wakely
  2020-09-16 11:06     ` Mike Crowe
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Wakely @ 2020-09-11  9:47 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

I'm finally getting round to merging this series!


On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>If std::future::wait_until is passed a time point measured against a clock
>that is neither std::chrono::steady_clock nor std::chrono::system_clock
>then the generic implementation of
>__atomic_futex_unsigned::_M_load_when_equal_until is called which
>calculates the timeout based on __clock_t and calls the
>_M_load_when_equal_until method for that clock to perform the actual wait.
>
>There's no guarantee that __clock_t is running at the same speed as the
>caller's clock, so if the underlying wait times out timeout we need to
>check the timeout against the caller's clock again before potentially
>looping.
>
>Also add two extra tests to the testsuite's async.cc:
>
>* run test03 with steady_clock_copy, which behaves identically to
>  std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
>  causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
>  that takes an arbitrary clock to be called.
>
>* invent test04 which uses a deliberately slow running clock in order to
>  exercise the looping behaviour o
>  __atomic_futex_unsigned::_M_load_when_equal_until described above.
>
>	* libstdc++-v3/include/bits/atomic_futex.h:
>	(__atomic_futex_unsigned) Add loop to _M_load_when_equal_until on
>	generic _Clock to check the timeout against _Clock again after
>	_M_load_when_equal_until returns indicating a timeout.
>
>	* libstdc++-v3/testsuite/30_threads/async/async.cc: Invent
>	slow_clock that runs at an eleventh of steady_clock's speed. Use it
>	to test the user-supplied-clock variant of
>	__atomic_futex_unsigned::_M_load_when_equal_until works generally
>	with test03 and loops correctly when the timeout time hasn't been
>	reached in test04.
>---
> libstdc++-v3/include/bits/atomic_futex.h         | 15 ++--
> libstdc++-v3/testsuite/30_threads/async/async.cc | 70 +++++++++++++++++-
> 2 files changed, 80 insertions(+), 5 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>index 4375129..5f95ade 100644
>--- a/libstdc++-v3/include/bits/atomic_futex.h
>+++ b/libstdc++-v3/include/bits/atomic_futex.h
>@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       _M_load_when_equal_until(unsigned __val, memory_order __mo,
> 	  const chrono::time_point<_Clock, _Duration>& __atime)
>       {
>-	const typename _Clock::time_point __c_entry = _Clock::now();
>-	const __clock_t::time_point __s_entry = __clock_t::now();
>-	const auto __delta = __atime - __c_entry;
>-	const auto __s_atime = __s_entry + __delta;
>-	return _M_load_when_equal_until(__val, __mo, __s_atime);
>+	typename _Clock::time_point __c_entry = _Clock::now();
>+	do {
>+	  const __clock_t::time_point __s_entry = __clock_t::now();
>+	  const auto __delta = __atime - __c_entry;
>+	  const auto __s_atime = __s_entry + __delta;
>+	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
>+	    return true;

I wonder if we can avoid looping if _Clock::is_steady is true i.e.

           if _GLIBCXX_CONSTEXPR (_Clock::is_steady)
             return false

If _Clock is steady then it can't be adjusted to move backwards. I am
not 100% sure, but I don't think a steady clock can run "slow" either.
But I'm checking with LWG about that. We should keep the loop for now.

The reason for wanting to return early is that _Clock::now() can be
quite expensive, so avoiding the second call if it's not needed would
be significant.

Related to this, I've been experimenting with a change to wait_for
that checks __rtime <= 0 and if so, uses __clock_t::time_point::min()
for the wait_until call, so that we don't bother to call
__clock_t::now(). For a non-positive duration we don't need to
determine the precise time_point in the past, because it's in the past
anyway. Using time_point::min() is as good as __clock_t::now() - rtime
(as long as we don't overflow anything when converting it to a struct
timespec, or course!)

Using future.wait_for(0s) to poll for a future becoming ready takes a
long time currently: https://wandbox.org/permlink/Z1arsDE7eHW9JrqJ
Using wait_until(C::time_point::min()) is faster, because it doesn't
call C::now().

This probably isn't important for most timed waiting functions in the
library, because I don't see any reason to use wait_for(0s) to poll a
mutex, condition_variable or atomic. But it's reasonable to do for
futures.

I already added (__rtime <= __rtime.zero()) to this_thread::sleep_for
in d1a74a287ee1a84b90e5675904dac7f945cffca1.

The extra branch to check rtime <= rtime.zero() or atime < C::now() is
probably insignificant compared to the cost of unnecessary calls to
now() on one or more clocks.

>+	  __c_entry = _Clock::now();
>+	} while (__c_entry < __atime);
>+	return false;
>       }
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements
  2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
                   ` (8 preceding siblings ...)
  2020-07-17  8:39 ` [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
@ 2020-09-11 13:26 ` Jonathan Wakely
  9 siblings, 0 replies; 25+ messages in thread
From: Jonathan Wakely @ 2020-09-11 13:26 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>This series ensures that the std::future::wait_* functions use
>std::chrono::steady_clock when required, introduces
>std::chrono::__detail::ceil to make that easier to do, and then makes
>use of that function to simplify and improve the fix for PR68519 in
>std::condition_variable.
>
>v1 of this series was originally posted back in September 2017 (see
>https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html )
>
>v2 of this series was originally posted back in January 2018 (see
>https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html )
>
>v3 of this series was originally posted back in August 2018 (see
>https://gcc.gnu.org/ml/libstdc++/2018-08/msg00001.html )
>
>v4 of this series was originally posted back in October 2019 (see
>https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01934.html )
>
>Changes since v4:
>
>* Expose std::chrono::ceil as std::chrono::__detail::ceil so that it
>  can be used to fix PR91486 in std::future::wait_for (as suggested by
>  John Salmon in PR91486.)
>
>* Use std::chrono::__detail::ceil to simplify fix for PR68519 in
>  std::condition_variable::wait_for.
>
>* Also fix equivalent of PR68519 in
>  std::condition_variable::wait_until and add test.

I've pushed this series (except for patch 8/8) to trunk now, with only
minor indentation changes in the code and some ChangeLog reformatting.

Thanks for your persistence in getting this fixed, and sorry it took
so many years!


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  2020-05-29  6:17 ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
@ 2020-09-11 14:40   ` Jonathan Wakely
  2020-09-11 17:22     ` Jonathan Wakely
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Wakely @ 2020-09-11 14:40 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>Convert the specified duration to the target clock's duration type
>before adding it to the current time in
>__atomic_futex_unsigned::_M_load_when_equal_for and
>_M_load_when_equal_until.  This removes the risk of the timeout being
>rounded down to the current time resulting in there being no wait at
>all when the duration type lacks sufficient precision to hold the
>steady_clock current time.
>
>Rather than using the style of fix from PR68519, let's expose the C++17
>std::chrono::ceil function as std::chrono::__detail::ceil so that it can
>be used in code compiled with earlier standards versions and simplify
>the fix. This was suggested by John Salmon in
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .
>
>This problem has become considerably less likely to trigger since I
>switched the __atomic__futex_unsigned::__clock_t reference clock from
>system_clock to steady_clock and added the loop, but the consequences of
>triggering it have changed too.
>
>By my calculations it takes just over 194 days from the epoch for the
>current time not to be representable in a float. This means that
>system_clock is always subject to the problem (with the standard 1970
>epoch) whereas steady_clock with float duration only runs out of
>resolution machine has been running for that long (assuming the Linux
>implementation of CLOCK_MONOTONIC.)
>
>The recently-added loop in
>__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
>into a busy wait.
>
>Unfortunately the combination of both of these things means that it's
>not possible to write a test case for this occurring in
>_M_load_when_equal_until as it stands.
>
>	* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
>          implementation of std::chrono::ceil into private namespace so
>          that it's available to pre-C++17 code.
>
>	* libstdc++-v3/include/bits/atomic_futex.h:
>	  (__atomic_futex_unsigned::_M_load_when_equal_for,
>	  __atomic_futex_unsigned::_M_load_when_equal_until): Use
>	  __detail::ceil to convert delta to the reference clock
>	  duration type to avoid resolution problems
>
>	* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
>          New test for __atomic_futex_unsigned::_M_load_when_equal_for.
>---
> libstdc++-v3/include/bits/atomic_futex.h         |  6 +++--
> libstdc++-v3/include/std/chrono                  | 19 +++++++++++++----
> libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +++++++++++++-
> 3 files changed, 34 insertions(+), 6 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>index 5f95ade..aa137a7 100644
>--- a/libstdc++-v3/include/bits/atomic_futex.h
>+++ b/libstdc++-v3/include/bits/atomic_futex.h
>@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       _M_load_when_equal_for(unsigned __val, memory_order __mo,
> 	  const chrono::duration<_Rep, _Period>& __rtime)
>       {
>+	using __dur = typename __clock_t::duration;
> 	return _M_load_when_equal_until(__val, __mo,
>-					__clock_t::now() + __rtime);
>+		    __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
>       }
>
>     // Returns false iff a timeout occurred.
>@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	do {
> 	  const __clock_t::time_point __s_entry = __clock_t::now();
> 	  const auto __delta = __atime - __c_entry;
>-	  const auto __s_atime = __s_entry + __delta;
>+	  const auto __s_atime = __s_entry +
>+	      chrono::__detail::ceil<_Duration>(__delta);
> 	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
> 	    return true;
> 	  __c_entry = _Clock::now();
>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>index 6d78f32..4257c7c 100644
>--- a/libstdc++-v3/include/std/chrono
>+++ b/libstdc++-v3/include/std/chrono
>@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> #endif
> #endif // C++20
>
>+    // We want to use ceil even when compiling for earlier standards versions
>+    namespace __detail
>+    {
>+      template<typename _ToDur, typename _Rep, typename _Period>
>+        constexpr __enable_if_is_duration<_ToDur>
>+        ceil(const duration<_Rep, _Period>& __d)
>+        {
>+	  auto __to = chrono::duration_cast<_ToDur>(__d);
>+	  if (__to < __d)
>+	    return __to + _ToDur{1};
>+	  return __to;
>+	}
>+    }
>+
> #if __cplusplus >= 201703L
> # define __cpp_lib_chrono 201611
>
>@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       constexpr __enable_if_is_duration<_ToDur>
>       ceil(const duration<_Rep, _Period>& __d)
>       {
>-	auto __to = chrono::duration_cast<_ToDur>(__d);
>-	if (__to < __d)
>-	  return __to + _ToDur{1};
>-	return __to;
>+	return __detail::ceil<_ToDur>(__d);

This isn't valid in C++11, a constexpr function needs to be just a
return statement. Fix incoming ...


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  2020-09-11 14:40   ` Jonathan Wakely
@ 2020-09-11 17:22     ` Jonathan Wakely
  2020-09-11 18:59       ` Jonathan Wakely
  2020-09-19 10:50       ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
  0 siblings, 2 replies; 25+ messages in thread
From: Jonathan Wakely @ 2020-09-11 17:22 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5354 bytes --]

On 11/09/20 15:41 +0100, Jonathan Wakely wrote:
>On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>>Convert the specified duration to the target clock's duration type
>>before adding it to the current time in
>>__atomic_futex_unsigned::_M_load_when_equal_for and
>>_M_load_when_equal_until.  This removes the risk of the timeout being
>>rounded down to the current time resulting in there being no wait at
>>all when the duration type lacks sufficient precision to hold the
>>steady_clock current time.
>>
>>Rather than using the style of fix from PR68519, let's expose the C++17
>>std::chrono::ceil function as std::chrono::__detail::ceil so that it can
>>be used in code compiled with earlier standards versions and simplify
>>the fix. This was suggested by John Salmon in
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .
>>
>>This problem has become considerably less likely to trigger since I
>>switched the __atomic__futex_unsigned::__clock_t reference clock from
>>system_clock to steady_clock and added the loop, but the consequences of
>>triggering it have changed too.
>>
>>By my calculations it takes just over 194 days from the epoch for the
>>current time not to be representable in a float. This means that
>>system_clock is always subject to the problem (with the standard 1970
>>epoch) whereas steady_clock with float duration only runs out of
>>resolution machine has been running for that long (assuming the Linux
>>implementation of CLOCK_MONOTONIC.)
>>
>>The recently-added loop in
>>__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
>>into a busy wait.
>>
>>Unfortunately the combination of both of these things means that it's
>>not possible to write a test case for this occurring in
>>_M_load_when_equal_until as it stands.
>>
>>	* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
>>         implementation of std::chrono::ceil into private namespace so
>>         that it's available to pre-C++17 code.
>>
>>	* libstdc++-v3/include/bits/atomic_futex.h:
>>	  (__atomic_futex_unsigned::_M_load_when_equal_for,
>>	  __atomic_futex_unsigned::_M_load_when_equal_until): Use
>>	  __detail::ceil to convert delta to the reference clock
>>	  duration type to avoid resolution problems
>>
>>	* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
>>         New test for __atomic_futex_unsigned::_M_load_when_equal_for.
>>---
>>libstdc++-v3/include/bits/atomic_futex.h         |  6 +++--
>>libstdc++-v3/include/std/chrono                  | 19 +++++++++++++----
>>libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +++++++++++++-
>>3 files changed, 34 insertions(+), 6 deletions(-)
>>
>>diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>>index 5f95ade..aa137a7 100644
>>--- a/libstdc++-v3/include/bits/atomic_futex.h
>>+++ b/libstdc++-v3/include/bits/atomic_futex.h
>>@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>      _M_load_when_equal_for(unsigned __val, memory_order __mo,
>>	  const chrono::duration<_Rep, _Period>& __rtime)
>>      {
>>+	using __dur = typename __clock_t::duration;
>>	return _M_load_when_equal_until(__val, __mo,
>>-					__clock_t::now() + __rtime);
>>+		    __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
>>      }
>>
>>    // Returns false iff a timeout occurred.
>>@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>	do {
>>	  const __clock_t::time_point __s_entry = __clock_t::now();
>>	  const auto __delta = __atime - __c_entry;
>>-	  const auto __s_atime = __s_entry + __delta;
>>+	  const auto __s_atime = __s_entry +
>>+	      chrono::__detail::ceil<_Duration>(__delta);

I'm testing the attached patch to fix the C++11 constexpr error, but
while re-looking at the uses of __detail::ceil I noticed this is using
_Duration as the target type. Shouldn't that be __clock_t::duration
instead? Why do we care about the duration of the user's time_point
here, rather than the preferred duration of the clock we're about to
wait against?


>>	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
>>	    return true;
>>	  __c_entry = _Clock::now();
>>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>>index 6d78f32..4257c7c 100644
>>--- a/libstdc++-v3/include/std/chrono
>>+++ b/libstdc++-v3/include/std/chrono
>>@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>#endif
>>#endif // C++20
>>
>>+    // We want to use ceil even when compiling for earlier standards versions
>>+    namespace __detail
>>+    {
>>+      template<typename _ToDur, typename _Rep, typename _Period>
>>+        constexpr __enable_if_is_duration<_ToDur>
>>+        ceil(const duration<_Rep, _Period>& __d)
>>+        {
>>+	  auto __to = chrono::duration_cast<_ToDur>(__d);
>>+	  if (__to < __d)
>>+	    return __to + _ToDur{1};
>>+	  return __to;
>>+	}
>>+    }
>>+
>>#if __cplusplus >= 201703L
>># define __cpp_lib_chrono 201611
>>
>>@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>      constexpr __enable_if_is_duration<_ToDur>
>>      ceil(const duration<_Rep, _Period>& __d)
>>      {
>>-	auto __to = chrono::duration_cast<_ToDur>(__d);
>>-	if (__to < __d)
>>-	  return __to + _ToDur{1};
>>-	return __to;
>>+	return __detail::ceil<_ToDur>(__d);
>
>This isn't valid in C++11, a constexpr function needs to be just a
>return statement. Fix incoming ...




[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 3379 bytes --]

commit 2c56e931c694f98ba77c02163b69a62242f23a3b
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Sep 11 18:09:46 2020

    libstdc++: Fix chrono::__detail::ceil to work with C++11
    
    In C++11 constexpr functions can only have a return statement, so we
    need to fix __detail::ceil to make it valid in C++11. This can be done
    by moving the comparison and increment into a new function, __ceil_impl,
    and calling that with the result of the duration_cast.
    
    This would mean the standard C++17 std::chrono::ceil function would make
    two further calls, which would add too much overhead when not inlined.
    For C++17 and later use a using-declaration to add chrono::ceil to
    namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
    as a C++11-compatible constexpr function template.
    
    libstdc++-v3/ChangeLog:
    
            * include/std/chrono [C++17] (chrono::__detail::ceil): Add
            using declaration to make chrono::ceil available for internal
            use with a consistent name.
            (chrono::__detail::__ceil_impl): New function template.
            (chrono::__detail::ceil): Use __ceil_impl to compare and
            increment the value. Remove SFINAE constraint.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 893d1f6b2c9..2db2b7d0edc 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -329,20 +329,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 #endif // C++20
 
-    // We want to use ceil even when compiling for earlier standards versions
-    namespace __detail
-    {
-      template<typename _ToDur, typename _Rep, typename _Period>
-	constexpr __enable_if_is_duration<_ToDur>
-	ceil(const duration<_Rep, _Period>& __d)
-	{
-	  auto __to = chrono::duration_cast<_ToDur>(__d);
-	  if (__to < __d)
-	    return __to + _ToDur{1};
-	  return __to;
-	}
-    }
-
 #if __cplusplus >= 201703L
 # define __cpp_lib_chrono 201611
 
@@ -360,7 +346,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       constexpr __enable_if_is_duration<_ToDur>
       ceil(const duration<_Rep, _Period>& __d)
       {
-	return __detail::ceil<_ToDur>(__d);
+	auto __to = chrono::duration_cast<_ToDur>(__d);
+	if (__to < __d)
+	  return __to - _ToDur{1};
+	return __to;
       }
 
     template <typename _ToDur, typename _Rep, typename _Period>
@@ -394,6 +383,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  return __d;
 	return -__d;
       }
+
+    // Make chrono::ceil<D> also usable as chrono::__detail::ceil<D>.
+    namespace __detail { using chrono::ceil; }
+
+#else // ! C++17
+
+    // We want to use ceil even when compiling for earlier standards versions.
+    // C++11 only allows a single statement in a constexpr function, so we
+    // need to move the comparison into a separate function, __ceil_impl.
+    namespace __detail
+    {
+      template<typename _Tp, typename _Up>
+	constexpr _Tp
+	__ceil_impl(const _Tp& __t, const _Up& __u)
+	{
+	  return (__t < __u) ? (__t + _Tp{1}) : __t;
+	}
+
+      // C++11-friendly version of std::chrono::ceil<D> for internal use.
+      template<typename _ToDur, typename _Rep, typename _Period>
+	constexpr _ToDur
+	ceil(const duration<_Rep, _Period>& __d)
+	{
+	  return __detail::__ceil_impl(chrono::duration_cast<_ToDur>(__d), __d);
+	}
+    }
 #endif // C++17
 
     /// duration_values

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  2020-09-11 17:22     ` Jonathan Wakely
@ 2020-09-11 18:59       ` Jonathan Wakely
  2020-09-19 10:37         ` libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]) Mike Crowe
  2020-09-19 10:50       ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
  1 sibling, 1 reply; 25+ messages in thread
From: Jonathan Wakely @ 2020-09-11 18:59 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7926 bytes --]

On 11/09/20 18:22 +0100, Jonathan Wakely wrote:
>On 11/09/20 15:41 +0100, Jonathan Wakely wrote:
>>On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>>>Convert the specified duration to the target clock's duration type
>>>before adding it to the current time in
>>>__atomic_futex_unsigned::_M_load_when_equal_for and
>>>_M_load_when_equal_until.  This removes the risk of the timeout being
>>>rounded down to the current time resulting in there being no wait at
>>>all when the duration type lacks sufficient precision to hold the
>>>steady_clock current time.
>>>
>>>Rather than using the style of fix from PR68519, let's expose the C++17
>>>std::chrono::ceil function as std::chrono::__detail::ceil so that it can
>>>be used in code compiled with earlier standards versions and simplify
>>>the fix. This was suggested by John Salmon in
>>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91486#c5 .
>>>
>>>This problem has become considerably less likely to trigger since I
>>>switched the __atomic__futex_unsigned::__clock_t reference clock from
>>>system_clock to steady_clock and added the loop, but the consequences of
>>>triggering it have changed too.
>>>
>>>By my calculations it takes just over 194 days from the epoch for the
>>>current time not to be representable in a float. This means that
>>>system_clock is always subject to the problem (with the standard 1970
>>>epoch) whereas steady_clock with float duration only runs out of
>>>resolution machine has been running for that long (assuming the Linux
>>>implementation of CLOCK_MONOTONIC.)
>>>
>>>The recently-added loop in
>>>__atomic_futex_unsigned::_M_load_when_equal_until turns this scenario
>>>into a busy wait.
>>>
>>>Unfortunately the combination of both of these things means that it's
>>>not possible to write a test case for this occurring in
>>>_M_load_when_equal_until as it stands.
>>>
>>>	* libstdc++-v3/include/std/chrono: (__detail::ceil) Move
>>>        implementation of std::chrono::ceil into private namespace so
>>>        that it's available to pre-C++17 code.
>>>
>>>	* libstdc++-v3/include/bits/atomic_futex.h:
>>>	  (__atomic_futex_unsigned::_M_load_when_equal_for,
>>>	  __atomic_futex_unsigned::_M_load_when_equal_until): Use
>>>	  __detail::ceil to convert delta to the reference clock
>>>	  duration type to avoid resolution problems
>>>
>>>	* libstdc++-v3/testsuite/30_threads/async/async.cc: (test_pr91486):
>>>        New test for __atomic_futex_unsigned::_M_load_when_equal_for.
>>>---
>>>libstdc++-v3/include/bits/atomic_futex.h         |  6 +++--
>>>libstdc++-v3/include/std/chrono                  | 19 +++++++++++++----
>>>libstdc++-v3/testsuite/30_threads/async/async.cc | 15 +++++++++++++-
>>>3 files changed, 34 insertions(+), 6 deletions(-)
>>>
>>>diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>>>index 5f95ade..aa137a7 100644
>>>--- a/libstdc++-v3/include/bits/atomic_futex.h
>>>+++ b/libstdc++-v3/include/bits/atomic_futex.h
>>>@@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>     _M_load_when_equal_for(unsigned __val, memory_order __mo,
>>>	  const chrono::duration<_Rep, _Period>& __rtime)
>>>     {
>>>+	using __dur = typename __clock_t::duration;
>>>	return _M_load_when_equal_until(__val, __mo,
>>>-					__clock_t::now() + __rtime);
>>>+		    __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
>>>     }
>>>
>>>   // Returns false iff a timeout occurred.
>>>@@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>	do {
>>>	  const __clock_t::time_point __s_entry = __clock_t::now();
>>>	  const auto __delta = __atime - __c_entry;
>>>-	  const auto __s_atime = __s_entry + __delta;
>>>+	  const auto __s_atime = __s_entry +
>>>+	      chrono::__detail::ceil<_Duration>(__delta);
>
>I'm testing the attached patch to fix the C++11 constexpr error, but
>while re-looking at the uses of __detail::ceil I noticed this is using
>_Duration as the target type. Shouldn't that be __clock_t::duration
>instead? Why do we care about the duration of the user's time_point
>here, rather than the preferred duration of the clock we're about to
>wait against?
>
>
>>>	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
>>>	    return true;
>>>	  __c_entry = _Clock::now();
>>>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>>>index 6d78f32..4257c7c 100644
>>>--- a/libstdc++-v3/include/std/chrono
>>>+++ b/libstdc++-v3/include/std/chrono
>>>@@ -299,6 +299,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>#endif
>>>#endif // C++20
>>>
>>>+    // We want to use ceil even when compiling for earlier standards versions
>>>+    namespace __detail
>>>+    {
>>>+      template<typename _ToDur, typename _Rep, typename _Period>
>>>+        constexpr __enable_if_is_duration<_ToDur>
>>>+        ceil(const duration<_Rep, _Period>& __d)
>>>+        {
>>>+	  auto __to = chrono::duration_cast<_ToDur>(__d);
>>>+	  if (__to < __d)
>>>+	    return __to + _ToDur{1};
>>>+	  return __to;
>>>+	}
>>>+    }
>>>+
>>>#if __cplusplus >= 201703L
>>># define __cpp_lib_chrono 201611
>>>
>>>@@ -316,10 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>     constexpr __enable_if_is_duration<_ToDur>
>>>     ceil(const duration<_Rep, _Period>& __d)
>>>     {
>>>-	auto __to = chrono::duration_cast<_ToDur>(__d);
>>>-	if (__to < __d)
>>>-	  return __to + _ToDur{1};
>>>-	return __to;
>>>+	return __detail::ceil<_ToDur>(__d);
>>
>>This isn't valid in C++11, a constexpr function needs to be just a
>>return statement. Fix incoming ...
>
>
>

>commit 2c56e931c694f98ba77c02163b69a62242f23a3b
>Author: Jonathan Wakely <jwakely@redhat.com>
>Date:   Fri Sep 11 18:09:46 2020
>
>    libstdc++: Fix chrono::__detail::ceil to work with C++11
>    
>    In C++11 constexpr functions can only have a return statement, so we
>    need to fix __detail::ceil to make it valid in C++11. This can be done
>    by moving the comparison and increment into a new function, __ceil_impl,
>    and calling that with the result of the duration_cast.
>    
>    This would mean the standard C++17 std::chrono::ceil function would make
>    two further calls, which would add too much overhead when not inlined.
>    For C++17 and later use a using-declaration to add chrono::ceil to
>    namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
>    as a C++11-compatible constexpr function template.
>    
>    libstdc++-v3/ChangeLog:
>    
>            * include/std/chrono [C++17] (chrono::__detail::ceil): Add
>            using declaration to make chrono::ceil available for internal
>            use with a consistent name.
>            (chrono::__detail::__ceil_impl): New function template.
>            (chrono::__detail::ceil): Use __ceil_impl to compare and
>            increment the value. Remove SFINAE constraint.
>
>diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
>index 893d1f6b2c9..2db2b7d0edc 100644
>--- a/libstdc++-v3/include/std/chrono
>+++ b/libstdc++-v3/include/std/chrono
>@@ -329,20 +329,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> #endif
> #endif // C++20
> 
>-    // We want to use ceil even when compiling for earlier standards versions
>-    namespace __detail
>-    {
>-      template<typename _ToDur, typename _Rep, typename _Period>
>-	constexpr __enable_if_is_duration<_ToDur>
>-	ceil(const duration<_Rep, _Period>& __d)
>-	{
>-	  auto __to = chrono::duration_cast<_ToDur>(__d);
>-	  if (__to < __d)
>-	    return __to + _ToDur{1};
>-	  return __to;
>-	}
>-    }
>-
> #if __cplusplus >= 201703L
> # define __cpp_lib_chrono 201611
> 
>@@ -360,7 +346,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       constexpr __enable_if_is_duration<_ToDur>
>       ceil(const duration<_Rep, _Period>& __d)
>       {
>-	return __detail::ceil<_ToDur>(__d);
>+	auto __to = chrono::duration_cast<_ToDur>(__d);
>+	if (__to < __d)
>+	  return __to - _ToDur{1};

Oops! That should be + not -

Committed with that fix.


[-- Attachment #2: patch.txt --]
[-- Type: text/x-patch, Size: 3379 bytes --]

commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Sep 11 19:59:11 2020

    libstdc++: Fix chrono::__detail::ceil to work with C++11
    
    In C++11 constexpr functions can only have a return statement, so we
    need to fix __detail::ceil to make it valid in C++11. This can be done
    by moving the comparison and increment into a new function, __ceil_impl,
    and calling that with the result of the duration_cast.
    
    This would mean the standard C++17 std::chrono::ceil function would make
    two further calls, which would add too much overhead when not inlined.
    For C++17 and later use a using-declaration to add chrono::ceil to
    namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
    as a C++11-compatible constexpr function template.
    
    libstdc++-v3/ChangeLog:
    
            * include/std/chrono [C++17] (chrono::__detail::ceil): Add
            using declaration to make chrono::ceil available for internal
            use with a consistent name.
            (chrono::__detail::__ceil_impl): New function template.
            (chrono::__detail::ceil): Use __ceil_impl to compare and
            increment the value. Remove SFINAE constraint.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 893d1f6b2c9..7539d7184ea 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -329,20 +329,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 #endif // C++20
 
-    // We want to use ceil even when compiling for earlier standards versions
-    namespace __detail
-    {
-      template<typename _ToDur, typename _Rep, typename _Period>
-	constexpr __enable_if_is_duration<_ToDur>
-	ceil(const duration<_Rep, _Period>& __d)
-	{
-	  auto __to = chrono::duration_cast<_ToDur>(__d);
-	  if (__to < __d)
-	    return __to + _ToDur{1};
-	  return __to;
-	}
-    }
-
 #if __cplusplus >= 201703L
 # define __cpp_lib_chrono 201611
 
@@ -360,7 +346,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       constexpr __enable_if_is_duration<_ToDur>
       ceil(const duration<_Rep, _Period>& __d)
       {
-	return __detail::ceil<_ToDur>(__d);
+	auto __to = chrono::duration_cast<_ToDur>(__d);
+	if (__to < __d)
+	  return __to + _ToDur{1};
+	return __to;
       }
 
     template <typename _ToDur, typename _Rep, typename _Period>
@@ -394,6 +383,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  return __d;
 	return -__d;
       }
+
+    // Make chrono::ceil<D> also usable as chrono::__detail::ceil<D>.
+    namespace __detail { using chrono::ceil; }
+
+#else // ! C++17
+
+    // We want to use ceil even when compiling for earlier standards versions.
+    // C++11 only allows a single statement in a constexpr function, so we
+    // need to move the comparison into a separate function, __ceil_impl.
+    namespace __detail
+    {
+      template<typename _Tp, typename _Up>
+	constexpr _Tp
+	__ceil_impl(const _Tp& __t, const _Up& __u)
+	{
+	  return (__t < __u) ? (__t + _Tp{1}) : __t;
+	}
+
+      // C++11-friendly version of std::chrono::ceil<D> for internal use.
+      template<typename _ToDur, typename _Rep, typename _Period>
+	constexpr _ToDur
+	ceil(const duration<_Rep, _Period>& __d)
+	{
+	  return __detail::__ceil_impl(chrono::duration_cast<_ToDur>(__d), __d);
+	}
+    }
 #endif // C++17
 
     /// duration_values

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock
  2020-09-11  9:47   ` Jonathan Wakely
@ 2020-09-16 11:06     ` Mike Crowe
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-09-16 11:06 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
> > index 4375129..5f95ade 100644
> > --- a/libstdc++-v3/include/bits/atomic_futex.h
> > +++ b/libstdc++-v3/include/bits/atomic_futex.h
> > @@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >       _M_load_when_equal_until(unsigned __val, memory_order __mo,
> > 	  const chrono::time_point<_Clock, _Duration>& __atime)
> >       {
> > -	const typename _Clock::time_point __c_entry = _Clock::now();
> > -	const __clock_t::time_point __s_entry = __clock_t::now();
> > -	const auto __delta = __atime - __c_entry;
> > -	const auto __s_atime = __s_entry + __delta;
> > -	return _M_load_when_equal_until(__val, __mo, __s_atime);
> > +	typename _Clock::time_point __c_entry = _Clock::now();
> > +	do {
> > +	  const __clock_t::time_point __s_entry = __clock_t::now();
> > +	  const auto __delta = __atime - __c_entry;
> > +	  const auto __s_atime = __s_entry + __delta;
> > +	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
> > +	    return true;

On Friday 11 September 2020 at 10:47:30 +0100, Jonathan Wakely wrote:
> I wonder if we can avoid looping if _Clock::is_steady is true i.e.
> 
>           if _GLIBCXX_CONSTEXPR (_Clock::is_steady)
>             return false
> 
> If _Clock is steady then it can't be adjusted to move backwards. I am
> not 100% sure, but I don't think a steady clock can run "slow" either.
> But I'm checking with LWG about that. We should keep the loop for now.

The C++20 final draft (p1239) claims: "C1::is_steady [is] true if t1 <= t2
is always true and the time between clock ticks is constant, otherwise
false."

which doesn't appear to mean that the clock cannot run slow. All it means
is that if the clock runs slow then it must be consistently slow. However,
IMO since "time" is not absolute and the quoted text does not say which
clock the clock ticks need to be constant with reference to, it loses all
meaning if you try to read it too carefully!

> The reason for wanting to return early is that _Clock::now() can be
> quite expensive, so avoiding the second call if it's not needed would
> be significant.

It looks like[1] clock_gettime can be expensive even with vdso, and we have
no idea how much work the custom clock's now() method is actually doing, so
I think this is worth considering.

[1] https://stackoverflow.com/questions/45863729/clock-gettime-might-be-very-slow-even-using-vdso

Let's suppose I have a system with an appallingly bad local clock which
means that I can't really rely on steady_clock for long timeouts. I might
invent my own good_steady_clock that consults some other hardware that has
a reliable monotonic clock. The chances are that its now() method is quite
expensive. Based on the wording in the standard quoted above, I would feel
justified in marking my clock as is_steady=true yet it would be essential
for _M_load_when_equal_until to call _Clock::now() a second time to ensure
that the call didn't return early.

However, with the wording tweaked to say "the time between clock ticks is
equal to the time between steady_clock ticks" then
good_steady_clock::is_steady would need to be false and the optimisation
would be guaranteed to be safe. With this definition the only ways a custom
clock with is_steady=true could differ from steady_clock are in its epoch
and duration type.

Looking through the rest of libstdc++, I can only see the value of
is_steady being used once, and that's for the same optimisation in
this_thread::sleep_for. It looks like slow_clock used for the tests has
is_steady = false.

So, I think that we should apply the optimisation you've described and then
perhaps try to get the wording tweaked to clarify that this is acceptable.
If you're in agreement then perhaps I could have a go at doing that?

> Related to this, I've been experimenting with a change to wait_for
> that checks __rtime <= 0 and if so, uses __clock_t::time_point::min()
> for the wait_until call, so that we don't bother to call
> __clock_t::now(). For a non-positive duration we don't need to
> determine the precise time_point in the past, because it's in the past
> anyway. Using time_point::min() is as good as __clock_t::now() - rtime
> (as long as we don't overflow anything when converting it to a struct
> timespec, or course!)

That sounds good to me.

> Using future.wait_for(0s) to poll for a future becoming ready takes a
> long time currently: https://wandbox.org/permlink/Z1arsDE7eHW9JrqJ
> Using wait_until(C::time_point::min()) is faster, because it doesn't
> call C::now().
> 
> This probably isn't important for most timed waiting functions in the
> library, because I don't see any reason to use wait_for(0s) to poll a
> mutex, condition_variable or atomic. But it's reasonable to do for
> futures.

This is presumably necessary because std::future lacks a try_wait() or
is_ready() method.

> I already added (__rtime <= __rtime.zero()) to this_thread::sleep_for
> in d1a74a287ee1a84b90e5675904dac7f945cffca1.

I suppose the only downside to this is for those with the misguided belief
that sleep_for(0s) is equivalent to sched_yield(). I don't think that
matters.

> The extra branch to check rtime <= rtime.zero() or atime < C::now() is
> probably insignificant compared to the cost of unnecessary calls to
> now() on one or more clocks.

Agreed.

Thanks.

Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486])
  2020-09-11 18:59       ` Jonathan Wakely
@ 2020-09-19 10:37         ` Mike Crowe
  2020-10-05 10:32           ` Jonathan Wakely
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Crowe @ 2020-09-19 10:37 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Friday 11 September 2020 at 19:59:36 +0100, Jonathan Wakely wrote:
> commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147
> Author: Jonathan Wakely <jwakely@redhat.com>
> Date:   Fri Sep 11 19:59:11 2020
> 
>     libstdc++: Fix chrono::__detail::ceil to work with C++11
>     
>     In C++11 constexpr functions can only have a return statement, so we
>     need to fix __detail::ceil to make it valid in C++11. This can be done
>     by moving the comparison and increment into a new function, __ceil_impl,
>     and calling that with the result of the duration_cast.
>     
>     This would mean the standard C++17 std::chrono::ceil function would make
>     two further calls, which would add too much overhead when not inlined.
>     For C++17 and later use a using-declaration to add chrono::ceil to
>     namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
>     as a C++11-compatible constexpr function template.
>
>
>     libstdc++-v3/ChangeLog:
>     
>             * include/std/chrono [C++17] (chrono::__detail::ceil): Add
>             using declaration to make chrono::ceil available for internal
>             use with a consistent name.
>             (chrono::__detail::__ceil_impl): New function template.
>             (chrono::__detail::ceil): Use __ceil_impl to compare and
>             increment the value. Remove SFINAE constraint.

This change introduces a new implementation of ceil that, as far as I can
tell, has no tests. A patch is attached to add the equivalent of the
existing chrono::ceil tests for chrono::__detail::ceil. The tests fail to
compile if I run them without 53ad6b1979f4bd7121e977c4a44151b14d8a0147 as
expected due to the previous non-C++11-compliant implementation.

Mike.

[-- Attachment #2: 0001-libstdc-Test-C-11-implementation-of-std-chrono-__det.patch --]
[-- Type: text/x-diff, Size: 3165 bytes --]

From b9dffbf4f1bc05a887269ea95a3b86d5a611e720 Mon Sep 17 00:00:00 2001
From: Mike Crowe <mac@mcrowe.com>
Date: Wed, 16 Sep 2020 15:31:28 +0100
Subject: [PATCH 1/2] libstdc++: Test C++11 implementation of
 std::chrono::__detail::ceil

Commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147 split the implementation
of std::chrono::__detail::ceil so that when compiling for C++17 and
later std::chrono::ceil is used but when compiling for earlier versions
a separate implementation is used to comply with C++11's limited
constexpr rules. Let's run the equivalent of the existing
std::chrono::ceil test cases on std::chrono::__detail::ceil too to make
sure that it doesn't get broken.

libstdc++-v3/ChangeLog:

	* testsuite/20_util/duration_cast/rounding_c++11.cc: Copy
        rounding.cc and alter to support compilation for C++11 and to
        test std::chrono::__detail::ceil.
---
 .../20_util/duration_cast/rounding_c++11.cc   | 43 +++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc

diff --git a/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
new file mode 100644
index 00000000000..f10d27fd082
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
@@ -0,0 +1,43 @@
+// Copyright (C) 2016-2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile { target c++11 } }
+
+#include <chrono>
+
+using std::chrono::seconds;
+using std::chrono::milliseconds;
+
+using fp_seconds = std::chrono::duration<float>;
+
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1000))
+	       == seconds(1) );
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1001))
+	       == seconds(2) );
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1500))
+	       == seconds(2) );
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1999))
+	       == seconds(2) );
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(2000))
+	       == seconds(2) );
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(2001))
+	       == seconds(3) );
+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(2500))
+	       == seconds(3) );
+static_assert( std::chrono::__detail::ceil<fp_seconds>(milliseconds(500))
+	       == fp_seconds{0.5f} );
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  2020-09-11 17:22     ` Jonathan Wakely
  2020-09-11 18:59       ` Jonathan Wakely
@ 2020-09-19 10:50       ` Mike Crowe
  2020-10-05 10:33         ` Jonathan Wakely
  1 sibling, 1 reply; 25+ messages in thread
From: Mike Crowe @ 2020-09-19 10:50 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2154 bytes --]

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > > diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
> > > index 5f95ade..aa137a7 100644
> > > --- a/libstdc++-v3/include/bits/atomic_futex.h
> > > +++ b/libstdc++-v3/include/bits/atomic_futex.h
> > > @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >      _M_load_when_equal_for(unsigned __val, memory_order __mo,
> > > 	  const chrono::duration<_Rep, _Period>& __rtime)
> > >      {
> > > +	using __dur = typename __clock_t::duration;
> > > 	return _M_load_when_equal_until(__val, __mo,
> > > -					__clock_t::now() + __rtime);
> > > +		    __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
> > >      }
> > > 
> > >    // Returns false iff a timeout occurred.
> > > @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > 	do {
> > > 	  const __clock_t::time_point __s_entry = __clock_t::now();
> > > 	  const auto __delta = __atime - __c_entry;
> > > -	  const auto __s_atime = __s_entry + __delta;
> > > +	  const auto __s_atime = __s_entry +
> > > +	      chrono::__detail::ceil<_Duration>(__delta);

On Friday 11 September 2020 at 18:22:04 +0100, Jonathan Wakely wrote:
> I'm testing the attached patch to fix the C++11 constexpr error, but
> while re-looking at the uses of __detail::ceil I noticed this is using
> _Duration as the target type. Shouldn't that be __clock_t::duration
> instead? Why do we care about the duration of the user's time_point
> here, rather than the preferred duration of the clock we're about to
> wait against?

I think you're right. I've attached a patch to fix it and also add a test
that would have failed at least some of the time if run on a machine with
an uptime greater than 208.5 days with:

 void test_pr91486_wait_until(): Assertion 'float_steady_clock::call_count <= 3' failed.

If we implement the optimisation to not re-check against the custom clock
when the wait is complete if is_steady == true then the test would have
started failing due to the wait not being long enough.

(I used a couple of the GCC farm machines that have high uptimes to test
this.)

Thanks.

Mike.

[-- Attachment #2: 0002-libstdc-Use-correct-duration-for-atomic_futex-wait-o.patch --]
[-- Type: text/x-diff, Size: 5065 bytes --]

From fa4decc00698785fb9e07aa36c0d862414ca5ff9 Mon Sep 17 00:00:00 2001
From: Mike Crowe <mac@mcrowe.com>
Date: Wed, 16 Sep 2020 16:55:11 +0100
Subject: [PATCH 2/2] libstdc++: Use correct duration for atomic_futex wait on
 custom clock [PR 91486]

As Jonathan Wakely pointed out[1], my change in commit
f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to
the target clock duration type rather than the input clock duration type
in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.)
condition_variable does.

As well as fixing this, let's create a rather contrived test that fails
with the previous code, but unfortunately only when run on a machine
with an uptime of over 208.5 days, and even then not always.

[1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html

libstdc++-v3/ChangeLog:

	* include/bits/atomic_futex.h:
        (__atomic_futex_unsigned::_M_load_when_equal_until): Use
        target clock duration type when rounding.

        * testsuite/30_threads/async/async.cc: (test_pr91486_wait_for)
        rename from test_pr91486.  (float_steady_clock) new class for
        test.  (test_pr91486_wait_until) new test.
---
 libstdc++-v3/include/bits/atomic_futex.h      |  2 +-
 .../testsuite/30_threads/async/async.cc       | 62 ++++++++++++++++++-
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
index aa137a7b64e..6093be0fbc7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -235,7 +235,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  const __clock_t::time_point __s_entry = __clock_t::now();
 	  const auto __delta = __atime - __c_entry;
 	  const auto __s_atime = __s_entry +
-	      chrono::__detail::ceil<_Duration>(__delta);
+	      chrono::__detail::ceil<__clock_t::duration>(__delta);
 	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
 	    return true;
 	  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 46f8d2f327d..1c779bfbcad 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -157,7 +157,7 @@ void test04()
   }
 }
 
-void test_pr91486()
+void test_pr91486_wait_for()
 {
   future<void> f1 = async(launch::async, []() {
       std::this_thread::sleep_for(std::chrono::seconds(1));
@@ -171,6 +171,63 @@ void test_pr91486()
   VERIFY( elapsed_steady >= std::chrono::seconds(1) );
 }
 
+// This is a clock with a very recent epoch which ensures that the difference
+// between now() and one second in the future is representable in a float so
+// that when the generic clock version of
+// __atomic_futex_unsigned::_M_load_when_equal_until calculates the delta it
+// gets a duration of 1.0f.  When chrono::steady_clock has moved sufficiently
+// far from its epoch (about 208.5 days in my testing - about 2^54ns because
+// there's a promotion to double happening too somewhere) adding 1.0f to the
+// current time has no effect.  Using this clock ensures that
+// __atomic_futex_unsigned::_M_load_when_equal_until is using
+// chrono::__detail::ceil correctly so that the function actually sleeps rather
+// than spinning.
+struct float_steady_clock
+{
+  using duration = std::chrono::duration<float>;
+  using rep = typename duration::rep;
+  using period = typename duration::period;
+  using time_point = std::chrono::time_point<float_steady_clock, duration>;
+  static constexpr bool is_steady = true;
+
+  static chrono::steady_clock::time_point epoch;
+  static int call_count;
+
+  static time_point now()
+  {
+    ++call_count;
+    auto real = std::chrono::steady_clock::now();
+    return time_point{real - epoch};
+  }
+};
+
+chrono::steady_clock::time_point float_steady_clock::epoch = chrono::steady_clock::now();
+int float_steady_clock::call_count = 0;
+
+void test_pr91486_wait_until()
+{
+  future<void> f1 = async(launch::async, []() {
+      std::this_thread::sleep_for(std::chrono::seconds(1));
+    });
+
+  float_steady_clock::time_point const now = float_steady_clock::now();
+
+  std::chrono::duration<float> const wait_time = std::chrono::seconds(1);
+  float_steady_clock::time_point const expire = now + wait_time;
+  VERIFY( expire > now );
+
+  auto const start_steady = chrono::steady_clock::now();
+  auto status = f1.wait_until(expire);
+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
+
+  // This checks that we didn't come back too soon
+  VERIFY( elapsed_steady >= std::chrono::seconds(1) );
+
+  // This checks that wait_until didn't busy wait checking the clock more times
+  // than necessary.
+  VERIFY( float_steady_clock::call_count <= 3 );
+}
+
 int main()
 {
   test01();
@@ -179,6 +236,7 @@ int main()
   test03<std::chrono::steady_clock>();
   test03<steady_clock_copy>();
   test04();
-  test_pr91486();
+  test_pr91486_wait_for();
+  test_pr91486_wait_until();
   return 0;
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486])
  2020-09-19 10:37         ` libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]) Mike Crowe
@ 2020-10-05 10:32           ` Jonathan Wakely
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Wakely @ 2020-10-05 10:32 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 19/09/20 11:37 +0100, Mike Crowe wrote:
>On Friday 11 September 2020 at 19:59:36 +0100, Jonathan Wakely wrote:
>> commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147
>> Author: Jonathan Wakely <jwakely@redhat.com>
>> Date:   Fri Sep 11 19:59:11 2020
>>
>>     libstdc++: Fix chrono::__detail::ceil to work with C++11
>>
>>     In C++11 constexpr functions can only have a return statement, so we
>>     need to fix __detail::ceil to make it valid in C++11. This can be done
>>     by moving the comparison and increment into a new function, __ceil_impl,
>>     and calling that with the result of the duration_cast.
>>
>>     This would mean the standard C++17 std::chrono::ceil function would make
>>     two further calls, which would add too much overhead when not inlined.
>>     For C++17 and later use a using-declaration to add chrono::ceil to
>>     namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
>>     as a C++11-compatible constexpr function template.
>>
>>
>>     libstdc++-v3/ChangeLog:
>>
>>             * include/std/chrono [C++17] (chrono::__detail::ceil): Add
>>             using declaration to make chrono::ceil available for internal
>>             use with a consistent name.
>>             (chrono::__detail::__ceil_impl): New function template.
>>             (chrono::__detail::ceil): Use __ceil_impl to compare and
>>             increment the value. Remove SFINAE constraint.
>
>This change introduces a new implementation of ceil that, as far as I can
>tell, has no tests. A patch is attached to add the equivalent of the
>existing chrono::ceil tests for chrono::__detail::ceil. The tests fail to
>compile if I run them without 53ad6b1979f4bd7121e977c4a44151b14d8a0147 as
>expected due to the previous non-C++11-compliant implementation.

Pushed to master, thanks.

From b9dffbf4f1bc05a887269ea95a3b86d5a611e720 Mon Sep 17 00:00:00 2001
>From: Mike Crowe <mac@mcrowe.com>
>Date: Wed, 16 Sep 2020 15:31:28 +0100
>Subject: [PATCH 1/2] libstdc++: Test C++11 implementation of
> std::chrono::__detail::ceil
>
>Commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147 split the implementation
>of std::chrono::__detail::ceil so that when compiling for C++17 and
>later std::chrono::ceil is used but when compiling for earlier versions
>a separate implementation is used to comply with C++11's limited
>constexpr rules. Let's run the equivalent of the existing
>std::chrono::ceil test cases on std::chrono::__detail::ceil too to make
>sure that it doesn't get broken.
>
>libstdc++-v3/ChangeLog:
>
>	* testsuite/20_util/duration_cast/rounding_c++11.cc: Copy
>        rounding.cc and alter to support compilation for C++11 and to
>        test std::chrono::__detail::ceil.
>---
> .../20_util/duration_cast/rounding_c++11.cc   | 43 +++++++++++++++++++
> 1 file changed, 43 insertions(+)
> create mode 100644 libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
>
>diff --git a/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
>new file mode 100644
>index 00000000000..f10d27fd082
>--- /dev/null
>+++ b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
>@@ -0,0 +1,43 @@
>+// Copyright (C) 2016-2020 Free Software Foundation, Inc.
>+//
>+// This file is part of the GNU ISO C++ Library.  This library is free
>+// software; you can redistribute it and/or modify it under the
>+// terms of the GNU General Public License as published by the
>+// Free Software Foundation; either version 3, or (at your option)
>+// any later version.
>+
>+// This library is distributed in the hope that it will be useful,
>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+// GNU General Public License for more details.
>+
>+// You should have received a copy of the GNU General Public License along
>+// with this library; see the file COPYING3.  If not see
>+// <http://www.gnu.org/licenses/>.
>+
>+// { dg-options "-std=gnu++11" }
>+// { dg-do compile { target c++11 } }
>+
>+#include <chrono>
>+
>+using std::chrono::seconds;
>+using std::chrono::milliseconds;
>+
>+using fp_seconds = std::chrono::duration<float>;
>+
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1000))
>+	       == seconds(1) );
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1001))
>+	       == seconds(2) );
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1500))
>+	       == seconds(2) );
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(1999))
>+	       == seconds(2) );
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(2000))
>+	       == seconds(2) );
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(2001))
>+	       == seconds(3) );
>+static_assert( std::chrono::__detail::ceil<seconds>(milliseconds(2500))
>+	       == seconds(3) );
>+static_assert( std::chrono::__detail::ceil<fp_seconds>(milliseconds(500))
>+	       == fp_seconds{0.5f} );
>-- 
>2.28.0
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]
  2020-09-19 10:50       ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
@ 2020-10-05 10:33         ` Jonathan Wakely
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Wakely @ 2020-10-05 10:33 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 19/09/20 11:50 +0100, Mike Crowe wrote:
>On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>> > > diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>> > > index 5f95ade..aa137a7 100644
>> > > --- a/libstdc++-v3/include/bits/atomic_futex.h
>> > > +++ b/libstdc++-v3/include/bits/atomic_futex.h
>> > > @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> > >      _M_load_when_equal_for(unsigned __val, memory_order __mo,
>> > > 	  const chrono::duration<_Rep, _Period>& __rtime)
>> > >      {
>> > > +	using __dur = typename __clock_t::duration;
>> > > 	return _M_load_when_equal_until(__val, __mo,
>> > > -					__clock_t::now() + __rtime);
>> > > +		    __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
>> > >      }
>> > >
>> > >    // Returns false iff a timeout occurred.
>> > > @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> > > 	do {
>> > > 	  const __clock_t::time_point __s_entry = __clock_t::now();
>> > > 	  const auto __delta = __atime - __c_entry;
>> > > -	  const auto __s_atime = __s_entry + __delta;
>> > > +	  const auto __s_atime = __s_entry +
>> > > +	      chrono::__detail::ceil<_Duration>(__delta);
>
>On Friday 11 September 2020 at 18:22:04 +0100, Jonathan Wakely wrote:
>> I'm testing the attached patch to fix the C++11 constexpr error, but
>> while re-looking at the uses of __detail::ceil I noticed this is using
>> _Duration as the target type. Shouldn't that be __clock_t::duration
>> instead? Why do we care about the duration of the user's time_point
>> here, rather than the preferred duration of the clock we're about to
>> wait against?
>
>I think you're right. I've attached a patch to fix it and also add a test
>that would have failed at least some of the time if run on a machine with
>an uptime greater than 208.5 days with:
>
> void test_pr91486_wait_until(): Assertion 'float_steady_clock::call_count <= 3' failed.
>
>If we implement the optimisation to not re-check against the custom clock
>when the wait is complete if is_steady == true then the test would have
>started failing due to the wait not being long enough.
>
>(I used a couple of the GCC farm machines that have high uptimes to test
>this.)

Also pushed to master. Thanks!


>Thanks.
>
>Mike.

From fa4decc00698785fb9e07aa36c0d862414ca5ff9 Mon Sep 17 00:00:00 2001
>From: Mike Crowe <mac@mcrowe.com>
>Date: Wed, 16 Sep 2020 16:55:11 +0100
>Subject: [PATCH 2/2] libstdc++: Use correct duration for atomic_futex wait on
> custom clock [PR 91486]
>
>As Jonathan Wakely pointed out[1], my change in commit
>f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to
>the target clock duration type rather than the input clock duration type
>in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.)
>condition_variable does.
>
>As well as fixing this, let's create a rather contrived test that fails
>with the previous code, but unfortunately only when run on a machine
>with an uptime of over 208.5 days, and even then not always.
>
>[1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html
>
>libstdc++-v3/ChangeLog:
>
>	* include/bits/atomic_futex.h:
>        (__atomic_futex_unsigned::_M_load_when_equal_until): Use
>        target clock duration type when rounding.
>
>        * testsuite/30_threads/async/async.cc: (test_pr91486_wait_for)
>        rename from test_pr91486.  (float_steady_clock) new class for
>        test.  (test_pr91486_wait_until) new test.
>---
> libstdc++-v3/include/bits/atomic_futex.h      |  2 +-
> .../testsuite/30_threads/async/async.cc       | 62 ++++++++++++++++++-
> 2 files changed, 61 insertions(+), 3 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
>index aa137a7b64e..6093be0fbc7 100644
>--- a/libstdc++-v3/include/bits/atomic_futex.h
>+++ b/libstdc++-v3/include/bits/atomic_futex.h
>@@ -235,7 +235,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	  const __clock_t::time_point __s_entry = __clock_t::now();
> 	  const auto __delta = __atime - __c_entry;
> 	  const auto __s_atime = __s_entry +
>-	      chrono::__detail::ceil<_Duration>(__delta);
>+	      chrono::__detail::ceil<__clock_t::duration>(__delta);
> 	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
> 	    return true;
> 	  __c_entry = _Clock::now();
>diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
>index 46f8d2f327d..1c779bfbcad 100644
>--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
>+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
>@@ -157,7 +157,7 @@ void test04()
>   }
> }
> 
>-void test_pr91486()
>+void test_pr91486_wait_for()
> {
>   future<void> f1 = async(launch::async, []() {
>       std::this_thread::sleep_for(std::chrono::seconds(1));
>@@ -171,6 +171,63 @@ void test_pr91486()
>   VERIFY( elapsed_steady >= std::chrono::seconds(1) );
> }
> 
>+// This is a clock with a very recent epoch which ensures that the difference
>+// between now() and one second in the future is representable in a float so
>+// that when the generic clock version of
>+// __atomic_futex_unsigned::_M_load_when_equal_until calculates the delta it
>+// gets a duration of 1.0f.  When chrono::steady_clock has moved sufficiently
>+// far from its epoch (about 208.5 days in my testing - about 2^54ns because
>+// there's a promotion to double happening too somewhere) adding 1.0f to the
>+// current time has no effect.  Using this clock ensures that
>+// __atomic_futex_unsigned::_M_load_when_equal_until is using
>+// chrono::__detail::ceil correctly so that the function actually sleeps rather
>+// than spinning.
>+struct float_steady_clock
>+{
>+  using duration = std::chrono::duration<float>;
>+  using rep = typename duration::rep;
>+  using period = typename duration::period;
>+  using time_point = std::chrono::time_point<float_steady_clock, duration>;
>+  static constexpr bool is_steady = true;
>+
>+  static chrono::steady_clock::time_point epoch;
>+  static int call_count;
>+
>+  static time_point now()
>+  {
>+    ++call_count;
>+    auto real = std::chrono::steady_clock::now();
>+    return time_point{real - epoch};
>+  }
>+};
>+
>+chrono::steady_clock::time_point float_steady_clock::epoch = chrono::steady_clock::now();
>+int float_steady_clock::call_count = 0;
>+
>+void test_pr91486_wait_until()
>+{
>+  future<void> f1 = async(launch::async, []() {
>+      std::this_thread::sleep_for(std::chrono::seconds(1));
>+    });
>+
>+  float_steady_clock::time_point const now = float_steady_clock::now();
>+
>+  std::chrono::duration<float> const wait_time = std::chrono::seconds(1);
>+  float_steady_clock::time_point const expire = now + wait_time;
>+  VERIFY( expire > now );
>+
>+  auto const start_steady = chrono::steady_clock::now();
>+  auto status = f1.wait_until(expire);
>+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
>+
>+  // This checks that we didn't come back too soon
>+  VERIFY( elapsed_steady >= std::chrono::seconds(1) );
>+
>+  // This checks that wait_until didn't busy wait checking the clock more times
>+  // than necessary.
>+  VERIFY( float_steady_clock::call_count <= 3 );
>+}
>+
> int main()
> {
>   test01();
>@@ -179,6 +236,7 @@ int main()
>   test03<std::chrono::steady_clock>();
>   test03<steady_clock_copy>();
>   test04();
>-  test_pr91486();
>+  test_pr91486_wait_for();
>+  test_pr91486_wait_until();
>   return 0;
> }
>-- 
>2.28.0
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  2020-05-29  6:17 ` [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait Mike Crowe
@ 2020-11-12 23:07   ` Jonathan Wakely
  2020-11-13 21:58     ` Mike Crowe
  0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Wakely @ 2020-11-12 23:07 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>The futex system call supports waiting for an absolute time if
>FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
>benefits:
>
>1. The call to gettimeofday is not required in order to calculate a
>   relative timeout.
>
>2. If someone changes the system clock during the wait then the futex
>   timeout will correctly expire earlier or later.  Currently that only
>   happens if the clock is changed prior to the call to gettimeofday.
>
>According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
>v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
>that the code still works correctly with earlier kernel versions, an ENOSYS
>error from futex[1] results in the futex_clock_realtime_unavailable flag
>being set.  This flag is used to avoid the unnecessary unsupported futex
>call in the future and to fall back to the previous gettimeofday and
>relative time implementation.
>
>glibc applied an equivalent switch in pthread_cond_timedwait to use
>FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
>glibc-2.10 back in 2009.  See
>glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
>
>The futex_clock_realtime_unavailable flag is accessed using
>std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
>two calls to _M_futex_wait_until happen to happen simultaneously then the
>only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
>risk discovering that it doesn't work and, if so, both set the flag.
>
>[1] This is how glibc's nptl-init.c determines whether these flags are
>    supported.
>
>	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
>	futex flags.  Add futex_clock_realtime_unavailable flag to store
>	result of trying to use
>	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
>	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
>	fall back to using gettimeofday and FUTEX_WAIT if that's not
>	supported.

Mike,

I've been doing some performance comparisons and this patch seems to
make quite a big difference to code that polls a future by calling
fut.wait_until(t) using any t < now() as the timeout. For example,
fut.wait_until(chrono::system_clock::time_point{}) to wait until the
UNIX epoch.

With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
commented out) I see that polling take < 100ns. With the change, it
takes 3000ns or more.

Now this is still far better than polling using fut.wait_for(0s) which
takes around 50000ns due to the clock_gettime call, but I'm about to
fix that.

I'm not sure how important it is for wait_until(past) to be fast, but
the difference from 100ns to 3000ns seems significant. Do you see the
same kind of numbers? Is this just a property of the futex wait with
an absolute time?

N.B. using wait_until(system_clock::time_point::min()) or any other
time before the epoch doesn't work. The futex syscall returns EINVAL
which we don't check for. I'm about to fix that too.


> libstdc++-v3/src/c++11/futex.cc | 37 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 37 insertions(+)
>
>diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
>index c9de11a..25b3e05 100644
>--- a/libstdc++-v3/src/c++11/futex.cc
>+++ b/libstdc++-v3/src/c++11/futex.cc
>@@ -35,8 +35,16 @@
>
> // Constants for the wait/wake futex syscall operations
> const unsigned futex_wait_op = 0;
>+const unsigned futex_wait_bitset_op = 9;
>+const unsigned futex_clock_realtime_flag = 256;
>+const unsigned futex_bitset_match_any = ~0;
> const unsigned futex_wake_op = 1;
>
>+namespace
>+{
>+  std::atomic<bool> futex_clock_realtime_unavailable;
>+}
>+
> namespace std _GLIBCXX_VISIBILITY(default)
> {
> _GLIBCXX_BEGIN_NAMESPACE_VERSION
>@@ -58,6 +66,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       }
>     else
>       {
>+	if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
>+	  {
>+	    struct timespec rt;
>+	    rt.tv_sec = __s.count();
>+	    rt.tv_nsec = __ns.count();
>+	    if (syscall (SYS_futex, __addr,
>+			 futex_wait_bitset_op | futex_clock_realtime_flag,
>+			 __val, &rt, nullptr, futex_bitset_match_any) == -1)
>+	      {
>+		__glibcxx_assert(errno == EINTR || errno == EAGAIN
>+				|| errno == ETIMEDOUT || errno == ENOSYS);
>+		if (errno == ETIMEDOUT)
>+		  return false;
>+		if (errno == ENOSYS)
>+		  {
>+		    futex_clock_realtime_unavailable.store(true,
>+						    std::memory_order_relaxed);
>+		    // Fall through to legacy implementation if the system
>+		    // call is unavailable.
>+		  }
>+		else
>+		  return true;
>+	      }
>+	    else
>+	      return true;
>+	  }
>+
>+	// We only get to here if futex_clock_realtime_unavailable was
>+	// true or has just been set to true.
> 	struct timeval tv;
> 	gettimeofday (&tv, NULL);
> 	// Convert the absolute timeout value to a relative timeout
>-- 
>git-series 0.9.1
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  2020-11-12 23:07   ` Jonathan Wakely
@ 2020-11-13 21:58     ` Mike Crowe
  2020-11-13 22:22       ` Jonathan Wakely
  2020-11-14 17:46       ` Mike Crowe
  0 siblings, 2 replies; 25+ messages in thread
From: Mike Crowe @ 2020-11-13 21:58 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > The futex system call supports waiting for an absolute time if
> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> > benefits:
> > 
> > 1. The call to gettimeofday is not required in order to calculate a
> >   relative timeout.
> > 
> > 2. If someone changes the system clock during the wait then the futex
> >   timeout will correctly expire earlier or later.  Currently that only
> >   happens if the clock is changed prior to the call to gettimeofday.
> > 
> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> > that the code still works correctly with earlier kernel versions, an ENOSYS
> > error from futex[1] results in the futex_clock_realtime_unavailable flag
> > being set.  This flag is used to avoid the unnecessary unsupported futex
> > call in the future and to fall back to the previous gettimeofday and
> > relative time implementation.
> > 
> > glibc applied an equivalent switch in pthread_cond_timedwait to use
> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> > glibc-2.10 back in 2009.  See
> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
> > 
> > The futex_clock_realtime_unavailable flag is accessed using
> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> > two calls to _M_futex_wait_until happen to happen simultaneously then the
> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> > risk discovering that it doesn't work and, if so, both set the flag.
> > 
> > [1] This is how glibc's nptl-init.c determines whether these flags are
> >    supported.
> > 
> > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
> > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
> > 	result of trying to use
> > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
> > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
> > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
> > 	supported.
> 
> Mike,
> 
> I've been doing some performance comparisons and this patch seems to
> make quite a big difference to code that polls a future by calling
> fut.wait_until(t) using any t < now() as the timeout. For example,
> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
> UNIX epoch.
> 
> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
> commented out) I see that polling take < 100ns. With the change, it
> takes 3000ns or more.
> 
> Now this is still far better than polling using fut.wait_for(0s) which
> takes around 50000ns due to the clock_gettime call, but I'm about to
> fix that.
> 
> I'm not sure how important it is for wait_until(past) to be fast, but
> the difference from 100ns to 3000ns seems significant. Do you see the
> same kind of numbers? Is this just a property of the futex wait with
> an absolute time?
> 
> N.B. using wait_until(system_clock::time_point::min()) or any other
> time before the epoch doesn't work. The futex syscall returns EINVAL
> which we don't check for. I'm about to fix that too.

I see similar behaviour. I suppose this is because the
gettimeofday/clock_gettime system calls are in the VDSO and therefore
usually much cheaper to call than the real system call SYS_futex.

If rather than bailing out early when the relative timeout is negative, I
call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
wait_until call takes about ten times longer than when using the absolute
SYS_futex. I can't really explain that.

Calling these functions with a time in the past is probably quite common if
you calculate a single timeout for several operations in sequence. What's
less clear is whether the performance matters that much when the return
value indicates a timeout anyway.

If gettimeofday/clock_gettime are cheap enough then I suppose we can call
them even in the absolute timeout case (losing benefit 1 above, which
appears to not really exist) to get the improved performance for timeouts
in the past whilst retaining the correct behaviour if the clock is warped
that this patch addressed (benefit 2 above.)

I'll try to come up with some standalone test cases with results for
further discussion. I suspect that the glibc people will be interested too.

Thanks for investigating this.

Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  2020-11-13 21:58     ` Mike Crowe
@ 2020-11-13 22:22       ` Jonathan Wakely
  2020-11-14 17:46       ` Mike Crowe
  1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Wakely @ 2020-11-13 22:22 UTC (permalink / raw)
  To: Mike Crowe; +Cc: libstdc++, gcc-patches

On 13/11/20 21:58 +0000, Mike Crowe via Libstdc++ wrote:
>On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
>> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
>> > The futex system call supports waiting for an absolute time if
>> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
>> > benefits:
>> >
>> > 1. The call to gettimeofday is not required in order to calculate a
>> >   relative timeout.
>> >
>> > 2. If someone changes the system clock during the wait then the futex
>> >   timeout will correctly expire earlier or later.  Currently that only
>> >   happens if the clock is changed prior to the call to gettimeofday.
>> >
>> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
>> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
>> > that the code still works correctly with earlier kernel versions, an ENOSYS
>> > error from futex[1] results in the futex_clock_realtime_unavailable flag
>> > being set.  This flag is used to avoid the unnecessary unsupported futex
>> > call in the future and to fall back to the previous gettimeofday and
>> > relative time implementation.
>> >
>> > glibc applied an equivalent switch in pthread_cond_timedwait to use
>> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
>> > glibc-2.10 back in 2009.  See
>> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
>> >
>> > The futex_clock_realtime_unavailable flag is accessed using
>> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
>> > two calls to _M_futex_wait_until happen to happen simultaneously then the
>> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
>> > risk discovering that it doesn't work and, if so, both set the flag.
>> >
>> > [1] This is how glibc's nptl-init.c determines whether these flags are
>> >    supported.
>> >
>> > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
>> > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
>> > 	result of trying to use
>> > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
>> > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
>> > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
>> > 	supported.
>>
>> Mike,
>>
>> I've been doing some performance comparisons and this patch seems to
>> make quite a big difference to code that polls a future by calling
>> fut.wait_until(t) using any t < now() as the timeout. For example,
>> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
>> UNIX epoch.
>>
>> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
>> commented out) I see that polling take < 100ns. With the change, it
>> takes 3000ns or more.
>>
>> Now this is still far better than polling using fut.wait_for(0s) which
>> takes around 50000ns due to the clock_gettime call, but I'm about to
>> fix that.
>>
>> I'm not sure how important it is for wait_until(past) to be fast, but
>> the difference from 100ns to 3000ns seems significant. Do you see the
>> same kind of numbers? Is this just a property of the futex wait with
>> an absolute time?
>>
>> N.B. using wait_until(system_clock::time_point::min()) or any other
>> time before the epoch doesn't work. The futex syscall returns EINVAL
>> which we don't check for. I'm about to fix that too.
>
>I see similar behaviour. I suppose this is because the
>gettimeofday/clock_gettime system calls are in the VDSO and therefore
>usually much cheaper to call than the real system call SYS_futex.
>
>If rather than bailing out early when the relative timeout is negative, I
>call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
>wait_until call takes about ten times longer than when using the absolute
>SYS_futex. I can't really explain that.
>
>Calling these functions with a time in the past is probably quite common if
>you calculate a single timeout for several operations in sequence. What's
>less clear is whether the performance matters that much when the return
>value indicates a timeout anyway.
>
>If gettimeofday/clock_gettime are cheap enough then I suppose we can call
>them even in the absolute timeout case (losing benefit 1 above, which
>appears to not really exist) to get the improved performance for timeouts
>in the past whilst retaining the correct behaviour if the clock is warped
>that this patch addressed (benefit 2 above.)
>
>I'll try to come up with some standalone test cases with results for
>further discussion. I suspect that the glibc people will be interested too.

Thanks, that would be great. I have about twenty things on my plate
already.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  2020-11-13 21:58     ` Mike Crowe
  2020-11-13 22:22       ` Jonathan Wakely
@ 2020-11-14 17:46       ` Mike Crowe
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Crowe @ 2020-11-14 17:46 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6914 bytes --]

On Friday 13 November 2020 at 21:58:25 +0000, Mike Crowe via Libstdc++ wrote:
> On Thursday 12 November 2020 at 23:07:47 +0000, Jonathan Wakely wrote:
> > On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > > The futex system call supports waiting for an absolute time if
> > > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> > > benefits:
> > > 
> > > 1. The call to gettimeofday is not required in order to calculate a
> > >   relative timeout.
> > > 
> > > 2. If someone changes the system clock during the wait then the futex
> > >   timeout will correctly expire earlier or later.  Currently that only
> > >   happens if the clock is changed prior to the call to gettimeofday.
> > > 
> > > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> > > that the code still works correctly with earlier kernel versions, an ENOSYS
> > > error from futex[1] results in the futex_clock_realtime_unavailable flag
> > > being set.  This flag is used to avoid the unnecessary unsupported futex
> > > call in the future and to fall back to the previous gettimeofday and
> > > relative time implementation.
> > > 
> > > glibc applied an equivalent switch in pthread_cond_timedwait to use
> > > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> > > glibc-2.10 back in 2009.  See
> > > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
> > > 
> > > The futex_clock_realtime_unavailable flag is accessed using
> > > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> > > two calls to _M_futex_wait_until happen to happen simultaneously then the
> > > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> > > risk discovering that it doesn't work and, if so, both set the flag.
> > > 
> > > [1] This is how glibc's nptl-init.c determines whether these flags are
> > >    supported.
> > > 
> > > 	* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
> > > 	futex flags.  Add futex_clock_realtime_unavailable flag to store
> > > 	result of trying to use
> > > 	FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
> > > 	Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
> > > 	fall back to using gettimeofday and FUTEX_WAIT if that's not
> > > 	supported.
> > 
> > Mike,
> > 
> > I've been doing some performance comparisons and this patch seems to
> > make quite a big difference to code that polls a future by calling
> > fut.wait_until(t) using any t < now() as the timeout. For example,
> > fut.wait_until(chrono::system_clock::time_point{}) to wait until the
> > UNIX epoch.
> > 
> > With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
> > commented out) I see that polling take < 100ns. With the change, it
> > takes 3000ns or more.
> > 
> > Now this is still far better than polling using fut.wait_for(0s) which
> > takes around 50000ns due to the clock_gettime call, but I'm about to
> > fix that.
> > 
> > I'm not sure how important it is for wait_until(past) to be fast, but
> > the difference from 100ns to 3000ns seems significant. Do you see the
> > same kind of numbers? Is this just a property of the futex wait with
> > an absolute time?
> > 
> > N.B. using wait_until(system_clock::time_point::min()) or any other
> > time before the epoch doesn't work. The futex syscall returns EINVAL
> > which we don't check for. I'm about to fix that too.
> 
> I see similar behaviour. I suppose this is because the
> gettimeofday/clock_gettime system calls are in the VDSO and therefore
> usually much cheaper to call than the real system call SYS_futex.
> 
> If rather than bailing out early when the relative timeout is negative, I
> call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
> wait_until call takes about ten times longer than when using the absolute
> SYS_futex. I can't really explain that.
> 
> Calling these functions with a time in the past is probably quite common if
> you calculate a single timeout for several operations in sequence. What's
> less clear is whether the performance matters that much when the return
> value indicates a timeout anyway.
> 
> If gettimeofday/clock_gettime are cheap enough then I suppose we can call
> them even in the absolute timeout case (losing benefit 1 above, which
> appears to not really exist) to get the improved performance for timeouts
> in the past whilst retaining the correct behaviour if the clock is warped
> that this patch addressed (benefit 2 above.)

I wrote the attached standalone program to measure the relative performance
of wait operations in the past (or with zero timeout in the relative case)
and ran it on a variety of machines. The results below are in nanoseconds:

|--------------------+---------+----------+-----------+----------+---------|
|                    |  Kernel |    futex |     futex |    futex |   clock |
| CPU                | version | realtime | monotonic | relative | gettime |
|--------------------+---------+----------+-----------+----------+---------|
| x86_64 E5-2690 v2  |    4.19 |     6942 |      6675 |    61175 |      85 |
| x86_64 i7-10510U   |     5.4 |    27950 |     36650 |    69969 |     433 |
| x86_64 i7-3770K    |     5.9 |    17152 |     17232 |    59827 |     322 |
| x86_64 i7-4790K    |    4.19 |    13280 |     12219 |    58225 |     413 |
| x86 Celeron G1610T |     4.9 |    18245 |     18626 |    58445 |     407 |
| Raspberry Pi 3     |     5.9 |    30765 |     30851 |    72776 |     300 |
| Raspberry Pi 2     |     5.4 |    23830 |     24104 |    91539 |    1062 |
| mips64 gcc24       |    4.19 |    23102 |     23503 |    69343 |    1236 |
| sparc32 gcc102     |     5.9 |    42657 |     39306 |    87568 |     688 |
|--------------------+---------+----------+-----------+----------+---------|

The first machine is virtual on ESXi (it appears that Xeons really are much
faster at this stuff!) The last two machines are .fsffrance.org GCC farm
machines.

The pthread_cond_timedwait durations for CLOCK_REALTIME were generally a
little bit better than the futex realtime durations.

The pthread_cond_timedwait durations for CLOCK_MONOTONIC differed greatly
by glibc version. With glibc v2.28 (from Debian 10) it completed very
quickly, but with glibc v2.31 (from Ubuntu 20.04) it took slightly longer
than the realtime version. This is presumably because glibc v2.28 would use
a relative timeout in this case and realise that there was no point it
calling futex - I changed that in
glibc:99d01ffcc386d1bfb681fb0684fcf6a6a996beb3.

So, it's clear to me that my changes have caused an absolute wait on a time
in the past to take longer in both libstdc++ and glibc. The question now is
whether that matters to anyone. I'll take my findings to the glibc list and
see what they think.

Thanks.

Mike.

[-- Attachment #2: test.cpp --]
[-- Type: text/x-c++src, Size: 5144 bytes --]

#include <linux/futex.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <stdlib.h>
#include <unistd.h>
#include <cerrno>
#include <chrono>
#include <cstdio>
#include <cinttypes>
#include <thread>

const unsigned futex_wait_op = 0;
const unsigned futex_wait_bitset_op = 9;
const unsigned futex_clock_monotonic_flag = 0;
const unsigned futex_clock_realtime_flag = 256;
const unsigned futex_bitset_match_any = ~0;
const unsigned futex_wake_op = 1;

static int futex(int *uaddr, int futex_op, int val,
                 const struct timespec *timeout, int *uaddr2, int val3)
{
    return syscall(SYS_futex, uaddr, futex_op, val,
                   timeout, uaddr, val3);
}

std::chrono::nanoseconds test_futex_realtime()
{
    int word = 1;

    struct timespec timeout{0,0};

    const auto start = std::chrono::steady_clock::now();

    int rc = futex(&word, futex_wait_bitset_op | futex_clock_realtime_flag,
                   1, &timeout, nullptr, futex_bitset_match_any);
    if (rc != -1)
    {
        fprintf(stderr, "Unexpected return value from futex: %d\n", rc);
        exit(1);
    }

    if (errno != ETIMEDOUT)
    {
        fprintf(stderr, "Unexpected error from futex: %m\n");
        exit(1);
    }

    const auto duration = std::chrono::steady_clock::now() - start;
    return duration;
}

std::chrono::nanoseconds test_futex_monotonic()
{
    int word = 1;

    struct timespec timeout{0,0};

    const auto start = std::chrono::steady_clock::now();

    int rc = futex(&word, futex_wait_bitset_op | futex_clock_monotonic_flag,
                   1, &timeout, nullptr, futex_bitset_match_any);
    if (rc != -1)
    {
        fprintf(stderr, "Unexpected return value from futex: %d\n", rc);
        exit(1);
    }

    if (errno != ETIMEDOUT)
    {
        fprintf(stderr, "Unexpected error from futex: %m\n");
        exit(1);
    }

    const auto duration = std::chrono::steady_clock::now() - start;
    return duration;
}

std::chrono::nanoseconds test_futex_relative()
{
    int word = 1;

    struct timespec timeout{0,0};

    const auto start = std::chrono::steady_clock::now();

    int rc = futex(&word, futex_wait_op,
                   1, &timeout, nullptr, 0);
    if (rc != -1)
    {
        fprintf(stderr, "Unexpected return value from futex: %d\n", rc);
        exit(1);
    }

    if (errno != ETIMEDOUT)
    {
        fprintf(stderr, "Unexpected error from futex: %m\n");
        exit(1);
    }

    const auto duration = std::chrono::steady_clock::now() - start;
    return duration;
}

std::chrono::nanoseconds test_clock_gettime()
{
    const auto start = std::chrono::steady_clock::now();

    struct timespec timeout;
    clock_gettime(CLOCK_REALTIME, &timeout);

    const auto duration = std::chrono::steady_clock::now() - start;

    return duration;
}

pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond_realtime = PTHREAD_COND_INITIALIZER;

std::chrono::nanoseconds test_cond_realtime()
{
    int rc = pthread_mutex_lock(&mut);
    if (rc != 0)
    {
        fprintf(stderr, "pthread_mutex_lock: %m\n");
        exit(1);
    }

    const auto start = std::chrono::steady_clock::now();

    struct timespec timeout{0,0};
    rc = pthread_cond_timedwait(&cond_realtime, &mut, &timeout);
    if (rc != ETIMEDOUT)
    {
        fprintf(stderr, "Unexpected return value from cond wait: %d\n", rc);
        exit(1);
    }

    const auto duration = std::chrono::steady_clock::now() - start;

    pthread_mutex_unlock(&mut);

    return duration;
}

pthread_cond_t cond_monotonic;

std::chrono::nanoseconds test_cond_monotonic()
{
    int rc = pthread_mutex_lock(&mut);
    if (rc != 0)
    {
        fprintf(stderr, "pthread_mutex_lock: %m\n");
        exit(1);
    }

    const auto start = std::chrono::steady_clock::now();

    struct timespec timeout{0,0};
    rc = pthread_cond_timedwait(&cond_monotonic, &mut, &timeout);
    if (rc != ETIMEDOUT)
    {
        fprintf(stderr, "Unexpected return value from cond wait: %d\n", rc);
        exit(1);
    }

    const auto duration = std::chrono::steady_clock::now() - start;

    pthread_mutex_unlock(&mut);

    return duration;
}

void show_mean(const char *name, std::chrono::nanoseconds (*f)())
{
    // warm up
    for(int i = 0; i < 10; ++i)
        f();

    // calculate mean
    const int count = 5000;
    std::chrono::nanoseconds total{0};
    for(int i = 0; i < count; ++i)
    {
        total += f();
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }
    printf("%s mean duration %" PRId64 "ns\n", name, (total/count).count());
}

int main()
{
    pthread_condattr_t condattr_monotonic;
    pthread_condattr_init(&condattr_monotonic);
    pthread_condattr_setclock(&condattr_monotonic, CLOCK_MONOTONIC);
    pthread_cond_init(&cond_monotonic, &condattr_monotonic);

    show_mean("futex_realtime", test_futex_realtime);
    show_mean("futex_monotonic", test_futex_monotonic);
    show_mean("futex_relative", test_futex_relative);
    show_mean("clock_gettime", test_clock_gettime);
    show_mean("cond_realtime", test_cond_realtime);
    show_mean("cond_monotonic", test_cond_monotonic);
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-11-14 17:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-29  6:17 [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
2020-05-29  6:17 ` [PATCH v5 1/8] libstdc++: Improve async test Mike Crowe
2020-05-29  6:17 ` [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait Mike Crowe
2020-11-12 23:07   ` Jonathan Wakely
2020-11-13 21:58     ` Mike Crowe
2020-11-13 22:22       ` Jonathan Wakely
2020-11-14 17:46       ` Mike Crowe
2020-05-29  6:17 ` [PATCH v5 3/8] libstdc++ futex: Support waiting on std::chrono::steady_clock directly Mike Crowe
2020-05-29  6:17 ` [PATCH v5 4/8] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock Mike Crowe
2020-05-29  6:17 ` [PATCH v5 5/8] libstdc++ futex: Loop when waiting against arbitrary clock Mike Crowe
2020-09-11  9:47   ` Jonathan Wakely
2020-09-16 11:06     ` Mike Crowe
2020-05-29  6:17 ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
2020-09-11 14:40   ` Jonathan Wakely
2020-09-11 17:22     ` Jonathan Wakely
2020-09-11 18:59       ` Jonathan Wakely
2020-09-19 10:37         ` libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]) Mike Crowe
2020-10-05 10:32           ` Jonathan Wakely
2020-09-19 10:50       ` [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486] Mike Crowe
2020-10-05 10:33         ` Jonathan Wakely
2020-05-29  6:17 ` [PATCH v5 7/8] libstdc++ condition_variable: Avoid rounding errors on custom clocks Mike Crowe
2020-05-29  6:17 ` [PATCH v5 8/8] libstdc++: Extra async tests, not for merging Mike Crowe
2020-07-17  8:39 ` [PATCH v5 0/8] std::future::wait_* and std::condition_variable improvements Mike Crowe
2020-07-17  9:34   ` Jonathan Wakely
2020-09-11 13:26 ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).