public inbox for libstdc++-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807]
@ 2023-11-06 15:03 Alexandre Oliva
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Oliva @ 2023-11-06 15:03 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:a55f657f200e66f299ce17254410f88ab3755611

commit a55f657f200e66f299ce17254410f88ab3755611
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Nov 6 11:57:49 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  The challenge is to
    do so without runtime overhead, because this is not about checking
    that it's in range, it's only about telling the compiler about it.
    
    This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
    optimizing, expands to code that invokes undefined behavior in case
    the expression doesn't hold, so that the compiler optimizes out the
    test and the entire branch containing, but retaining enough
    information about the paths that shouldn't be taken, so that at
    remaining paths it optimizes based on the assumption.
    
    I also introduce a member function in bit iterators that conveys to
    the compiler the information that the assumption is supposed to hold,
    and various calls throughout member functions of bit iterators that
    might not otherwise know that the offsets have to be in range,
    making pessimistic decisions and failing to optimize out cases that it
    could.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
            (_Bit_iterator_base): Add _M_normalized_p and
            _M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
            _M_incr, operator==, operator<=>, operator<, and operator-.
            (_Bit_iterator): Also use them in operator*.
            (_Bit_const_iterator): Likewise.
    
    Issue: gcc#155
    TN: W517-007

Diff:
---
 libstdc++-v3/include/bits/stl_bvector.h | 72 +++++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd4..b0c2e8f4fa9 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,52 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption fails, invoke undefined behavior, so that the
+// test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but it's never executed.
+#if __cpp_lib_is_constant_evaluated
+    do							\
+      if (!(expr) && std::is_constant_evaluated ())	\
+	{						\
+	  void **__assert_failed = 0;			\
+	  *__assert_failed = 0;				\
+	  __builtin_unreachable ();			\
+	}						\
+    while (0)
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (!expr)				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#endif
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    (void)(false && (expr))
+#endif
+
+    _GLIBCXX20_CONSTEXPR
+    bool
+    _M_normalized_p() const
+    {
+      return (_M_offset < unsigned(_S_word_bit));
+    }
+
+    _GLIBCXX20_CONSTEXPR
+    void
+    _M_assume_normalized() const
+    {
+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +231,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +243,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +255,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +270,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +282,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +293,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +323,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      // Make _M_offset's range visible to optimizers.
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +357,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +471,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807]
@ 2023-11-09  1:57 Alexandre Oliva
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Oliva @ 2023-11-09  1:57 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:e84bca033f18ff6375addb1e4b976a5afccb16c1

commit e84bca033f18ff6375addb1e4b976a5afccb16c1
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Nov 6 11:57:49 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  Using attribute
    assume (_M_offset <= ...) didn't work, because gimple lowered that to
    something that vrp could only use to ensure 'this' was non-NULL.
    Exposing _M_offset as an automatic variable/gimple register outside
    the unevaluated assume operand enabled the optimizer to do its job.
    
    Rather than placing such load-then-assume constructs all over, I
    introduced an always-inline member function in bit iterators that does
    the job of conveying to the compiler the information that the
    assumption is supposed to hold, and various calls throughout functions
    pertaining to bit iterators that might not otherwise know that the
    offsets have to be in range, so that the compiler no longer needs to
    make conservative assumptions that prevent optimizations.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (_Bit_iterator_base): Add
            _M_assume_normalized member function.  Call it in _M_bump_up,
            _M_bump_down, _M_incr, operator==, operator<=>, operator<, and
            operator-.
            (_Bit_iterator): Also call it in operator*.
            (_Bit_const_iterator): Likewise.

Diff:
---
 libstdc++-v3/include/bits/stl_bvector.h | 37 ++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd4..2b91af2005f 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -56,6 +56,10 @@
 #ifndef _STL_BVECTOR_H
 #define _STL_BVECTOR_H 1
 
+#ifndef _GLIBCXX_ALWAYS_INLINE
+#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__))
+#endif
+
 #if __cplusplus >= 201103L
 #include <initializer_list>
 #include <bits/functional_hash.h>
@@ -177,6 +181,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+    _GLIBCXX20_CONSTEXPR _GLIBCXX_ALWAYS_INLINE
+    void
+    _M_assume_normalized() const
+    {
+      unsigned int __ofst = _M_offset;
+      __attribute__ ((__assume__ (__ofst < unsigned(_S_word_bit))));
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +289,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +322,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +436,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807]
@ 2023-11-08 23:48 Alexandre Oliva
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Oliva @ 2023-11-08 23:48 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:b6635cb0fcea4a0e8ef1d48612426ce32af4d837

commit b6635cb0fcea4a0e8ef1d48612426ce32af4d837
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Nov 6 11:57:49 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  The challenge is to
    do so without runtime overhead, because this is not about checking
    that it's in range, it's only about telling the compiler about it.
    
    This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
    optimizing, expands to code that invokes undefined behavior in case
    the expression doesn't hold, so that the compiler optimizes out the
    test and the entire branch containing, but retaining enough
    information about the paths that shouldn't be taken, so that at
    remaining paths it optimizes based on the assumption.
    
    I also introduce a member function in bit iterators that conveys to
    the compiler the information that the assumption is supposed to hold,
    and various calls throughout member functions of bit iterators that
    might not otherwise know that the offsets have to be in range,
    making pessimistic decisions and failing to optimize out cases that it
    could.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
            (_Bit_iterator_base): Add _M_normalized_p and
            _M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
            _M_incr, operator==, operator<=>, operator<, and operator-.
            (_Bit_iterator): Also use them in operator*.
            (_Bit_const_iterator): Likewise.

Diff:
---
 libstdc++-v3/include/bits/stl_bvector.h | 75 +++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd4..81b31684645 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but it's never executed.
+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (std::is_constant_evaluated ())	\
+	static_assert(expr);			\
+      else if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#endif
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    (void)(false && (expr))
+#endif
+
+    _GLIBCXX20_CONSTEXPR
+    bool
+    _M_normalized_p() const
+    {
+      return (_M_offset < unsigned(_S_word_bit));
+    }
+
+    _GLIBCXX20_CONSTEXPR
+    void
+    _M_assume_normalized() const
+    {
+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      // Make _M_offset's range visible to optimizers.
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807]
@ 2023-11-08 19:22 Alexandre Oliva
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Oliva @ 2023-11-08 19:22 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:01d296f6afef16aee8301d871b0253d40d307b3c

commit 01d296f6afef16aee8301d871b0253d40d307b3c
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Nov 6 11:57:49 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  The challenge is to
    do so without runtime overhead, because this is not about checking
    that it's in range, it's only about telling the compiler about it.
    
    This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
    optimizing, expands to code that invokes undefined behavior in case
    the expression doesn't hold, so that the compiler optimizes out the
    test and the entire branch containing, but retaining enough
    information about the paths that shouldn't be taken, so that at
    remaining paths it optimizes based on the assumption.
    
    I also introduce a member function in bit iterators that conveys to
    the compiler the information that the assumption is supposed to hold,
    and various calls throughout member functions of bit iterators that
    might not otherwise know that the offsets have to be in range,
    making pessimistic decisions and failing to optimize out cases that it
    could.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
            (_Bit_iterator_base): Add _M_normalized_p and
            _M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
            _M_incr, operator==, operator<=>, operator<, and operator-.
            (_Bit_iterator): Also use them in operator*.
            (_Bit_const_iterator): Likewise.

Diff:
---
 libstdc++-v3/include/bits/stl_bvector.h | 75 +++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd4..81b31684645 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but it's never executed.
+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (std::is_constant_evaluated ())	\
+	static_assert(expr);			\
+      else if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#endif
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    (void)(false && (expr))
+#endif
+
+    _GLIBCXX20_CONSTEXPR
+    bool
+    _M_normalized_p() const
+    {
+      return (_M_offset < unsigned(_S_word_bit));
+    }
+
+    _GLIBCXX20_CONSTEXPR
+    void
+    _M_assume_normalized() const
+    {
+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      // Make _M_offset's range visible to optimizers.
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807]
@ 2023-11-07 11:33 Alexandre Oliva
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Oliva @ 2023-11-07 11:33 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:a90be270cb009930d5f51c1a1a4836325f299872

commit a90be270cb009930d5f51c1a1a4836325f299872
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Nov 6 11:57:49 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  The challenge is to
    do so without runtime overhead, because this is not about checking
    that it's in range, it's only about telling the compiler about it.
    
    This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
    optimizing, expands to code that invokes undefined behavior in case
    the expression doesn't hold, so that the compiler optimizes out the
    test and the entire branch containing, but retaining enough
    information about the paths that shouldn't be taken, so that at
    remaining paths it optimizes based on the assumption.
    
    I also introduce a member function in bit iterators that conveys to
    the compiler the information that the assumption is supposed to hold,
    and various calls throughout member functions of bit iterators that
    might not otherwise know that the offsets have to be in range,
    making pessimistic decisions and failing to optimize out cases that it
    could.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
            (_Bit_iterator_base): Add _M_normalized_p and
            _M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
            _M_incr, operator==, operator<=>, operator<, and operator-.
            (_Bit_iterator): Also use them in operator*.
            (_Bit_const_iterator): Likewise.

Diff:
---
 libstdc++-v3/include/bits/stl_bvector.h | 75 +++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd4..81b31684645 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but it's never executed.
+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (std::is_constant_evaluated ())	\
+	static_assert(expr);			\
+      else if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#endif
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    (void)(false && (expr))
+#endif
+
+    _GLIBCXX20_CONSTEXPR
+    bool
+    _M_normalized_p() const
+    {
+      return (_M_offset < unsigned(_S_word_bit));
+    }
+
+    _GLIBCXX20_CONSTEXPR
+    void
+    _M_assume_normalized() const
+    {
+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      // Make _M_offset's range visible to optimizers.
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807]
@ 2023-11-06 18:06 Alexandre Oliva
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Oliva @ 2023-11-06 18:06 UTC (permalink / raw)
  To: gcc-cvs, libstdc++-cvs

https://gcc.gnu.org/g:f808159120957477f7d149d51c130c9486b04d8d

commit f808159120957477f7d149d51c130c9486b04d8d
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Mon Nov 6 11:57:49 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]
    
    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.
    
    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.
    
    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.
    
    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.
    
    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  The challenge is to
    do so without runtime overhead, because this is not about checking
    that it's in range, it's only about telling the compiler about it.
    
    This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
    optimizing, expands to code that invokes undefined behavior in case
    the expression doesn't hold, so that the compiler optimizes out the
    test and the entire branch containing, but retaining enough
    information about the paths that shouldn't be taken, so that at
    remaining paths it optimizes based on the assumption.
    
    I also introduce a member function in bit iterators that conveys to
    the compiler the information that the assumption is supposed to hold,
    and various calls throughout member functions of bit iterators that
    might not otherwise know that the offsets have to be in range,
    making pessimistic decisions and failing to optimize out cases that it
    could.
    
    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.
    
    
    for  libstdc++-v3/ChangeLog
    
            PR libstdc++/110807
            * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
            (_Bit_iterator_base): Add _M_normalized_p and
            _M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
            _M_incr, operator==, operator<=>, operator<, and operator-.
            (_Bit_iterator): Also use them in operator*.
            (_Bit_const_iterator): Likewise.

Diff:
---
 libstdc++-v3/include/bits/stl_bvector.h | 75 +++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd4..81b31684645 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
+// If the assumption (EXPR) fails, invoke undefined behavior, so that
+// the test and the failure block gets optimized out, but the compiler
+// still recalls that (expr) can be taken for granted.  Use this only
+// for expressions that are simple and explicit enough that the
+// compiler can optimize based on them.  When not optimizing, the
+// expression is still compiled, but it's never executed.
+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (std::is_constant_evaluated ())	\
+	static_assert(expr);			\
+      else if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    do						\
+      if (!(expr))				\
+	{					\
+	  void **__assert_failed = 0;		\
+	  *__assert_failed = 0;			\
+	  __builtin_unreachable ();		\
+	}					\
+    while (0)
+#endif
+#else
+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
+    (void)(false && (expr))
+#endif
+
+    _GLIBCXX20_CONSTEXPR
+    bool
+    _M_normalized_p() const
+    {
+      return (_M_offset < unsigned(_S_word_bit));
+    }
+
+    _GLIBCXX20_CONSTEXPR
+    void
+    _M_assume_normalized() const
+    {
+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      // Make _M_offset's range visible to optimizers.
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-09  1:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-06 15:03 [gcc(refs/users/aoliva/heads/testme)] libstdc++: optimize bit iterators assuming normalization [PR110807] Alexandre Oliva
2023-11-06 18:06 Alexandre Oliva
2023-11-07 11:33 Alexandre Oliva
2023-11-08 19:22 Alexandre Oliva
2023-11-08 23:48 Alexandre Oliva
2023-11-09  1:57 Alexandre Oliva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).