public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Alexandre Oliva <oliva@adacore.com>
To: Jonathan Wakely <jwakely@redhat.com>
Cc: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org
Subject: [PATCH v2] libstdc++: optimize bit iterators assuming normalization [PR110807]
Date: Wed, 08 Nov 2023 22:17:05 -0300	[thread overview]
Message-ID: <ormsvnhgj2.fsf_-_@lxoliva.fsfla.org> (raw)
In-Reply-To: <ZUviWUHetjSC7/rD@redhat.com> (Jonathan Wakely's message of "Wed, 8 Nov 2023 19:32:41 +0000")

On Nov  8, 2023, Jonathan Wakely <jwakely@redhat.com> wrote:

> A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and
> __GLIBCXX_DISABLE_ASSUMPTIONS please.

That's entirely gone now.

>> +    do						\
>> +      if (std::is_constant_evaluated ())	\
>> +	static_assert(expr);			\

> This can never be valid.

*nod*

> This already works fine in constant evaluation anyway.

Yeah, that's what I figured.

> But what's the null dereference for?

The idea was to clearly trigger undefined behavior.  Maybe it wasn't
needed, it didn't occur to me that __builtin_unreachable() would be
enough.  I realize I was really trying to emulate attribute assume, even
without knowing it existed ;-)

>> +#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
>> +    (void)(false && (expr))

> What's the point of this, just to verify that (expr) is contextually
> convertible to bool?

I'd have phrased it as "avoid the case in which something compiles with
-O0 but not with -O", but yeah ;-)

> We don't use the _p suffix for predicates in the library.
> Please use just _M_normalized or _M_is_normalized.

ACK.  It's also gone now.

> But do we even need this function? It's not used anywhere else, can we
> just inline the condition into _M_assume_normalized() ?

I had other uses for it in earlier versions of the patch, but it makes
no sense any more indeed.

>> +    _GLIBCXX20_CONSTEXPR
>> +    void
>> +    _M_assume_normalized() const

> I think this should use _GLIBCXX_ALWAYS_INLINE

*nod*, thanks

>> +    {
>> +      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());

> Is there even any benefit to this macro?

I just thought it could have other uses, without being aware that the
entire concept was available as a statement attribute.  Funny, I'd even
searched for it among the existing attributes and builtins, but somehow
I managed to miss it.  Thanks for getting me back down that path.

>        __attribute__((__assume__(_M_offset < unsigned(_S_word_bit))));

That unfortunately doesn't work, because the assume lowering doesn't go
as far as dereferencing the implicit this and making an SSA_NAME out of
the loaded _M_offset, which we'd need to be able to optimize based on
it.  But that only took me a while to figure out and massage into
something that had the desired effect.  Now, maybe the above *should*
have that effect already, but unfortunately it doesn't.

> Maybe even get rid of _M_assume_normalized() as a function and just
> put that attribute everywhere you currently use _M_assume_normalized.

Because of the slight kludge required to make the attribute have the
desired effect (namely ensuring the _M_offset reference is evaluated),
I've retained it as an inline function.

Here's what I'm retesting now.  WDYT?


libstdc++: optimize bit iterators assuming normalization [PR110807]

The representation of bit iterators, using a pointer into an array of
words, and an unsigned bit offset into that word, makes for some
optimization challenges: because the compiler doesn't know that the
offset is always in a certain narrow range, beginning at zero and
ending before the word bitwidth, when a function loads an offset that
it hasn't normalized itself, it may fail to derive certain reasonable
conclusions, even to the point of retaining useless calls that elicit
incorrect warnings.

Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
list to a global bit vector variable.  Based on the compile-time
constant length of the list, we decide in _M_insert_range whether to
use the existing storage or to allocate new storage for the vector.
After allocation, we decide in _M_copy_aligned how to copy any
preexisting portions of the vector to the newly-allocated storage.
When copying two or more words, we use __builtin_memmove.

However, because we compute the available room using bit offsets
without range information, even comparing them with constants, we fail
to infer ranges for the preexisting vector depending on word size, and
may thus retain the memmove call despite knowing we've only allocated
one word.

Other parts of the compiler then detect the mismatch between the
constant allocation size and the much larger range that could
theoretically be copied into the newly-allocated storage if we could
reach the call.

Ensuring the compiler is aware of the constraints on the offset range
enables it to do a much better job at optimizing.  Using attribute
assume (_M_offset <= ...) didn't work, because gimple lowered that to
something that vrp could only use to ensure 'this' was non-NULL.
Exposing _M_offset as an automatic variable/gimple register outside
the unevaluated assume operand enabled the optimizer to do its job.

Rather than placing such load-then-assume constructs all over, I
introduced an always-inline member function in bit iterators that does
the job of conveying to the compiler the information that the
assumption is supposed to hold, and various calls throughout functions
pertaining to bit iterators that might not otherwise know that the
offsets have to be in range, so that the compiler no longer needs to
make conservative assumptions that prevent optimizations.

With the explicit assumptions, the compiler can correlate the test for
available storage in the vector with the test for how much storage
might need to be copied, and determine that, if we're not asking for
enough room for two or more words, we can omit entirely the code to
copy two or more words, without any runtime overhead whatsoever: no
traces remain of the undefined behavior or of the tests that inform
the compiler about the assumptions that must hold.


for  libstdc++-v3/ChangeLog

	PR libstdc++/110807
	* include/bits/stl_bvector.h (_Bit_iterator_base): Add
	_M_assume_normalized member function.  Call it in _M_bump_up,
	_M_bump_down, _M_incr, operator==, operator<=>, operator<, and
	operator-.
	(_Bit_iterator): Also call it in operator*.
	(_Bit_const_iterator): Likewise.
---
 libstdc++-v3/include/bits/stl_bvector.h |   37 ++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
index 8d18bcaffd434..bb9f38cdf2d59 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -56,6 +56,10 @@
 #ifndef _STL_BVECTOR_H
 #define _STL_BVECTOR_H 1
 
+#ifndef _GLIBCXX_ALWAYS_INLINE
+#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__))
+#endif
+
 #if __cplusplus >= 201103L
 #include <initializer_list>
 #include <bits/functional_hash.h>
@@ -177,6 +181,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _Bit_type * _M_p;
     unsigned int _M_offset;
 
+    _GLIBCXX20_CONSTEXPR _GLIBCXX_ALWAYS_INLINE
+    void
+    _M_assume_normalized() const
+    {
+      unsigned int ofst = _M_offset;
+      __attribute__ ((__assume__ (ofst < unsigned(_S_word_bit))));
+    }
+
     _GLIBCXX20_CONSTEXPR
     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
     : _M_p(__x), _M_offset(__y) { }
@@ -185,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_up()
     {
+      _M_assume_normalized();
       if (_M_offset++ == int(_S_word_bit) - 1)
 	{
 	  _M_offset = 0;
@@ -196,6 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_bump_down()
     {
+      _M_assume_normalized();
       if (_M_offset-- == 0)
 	{
 	  _M_offset = int(_S_word_bit) - 1;
@@ -207,6 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     void
     _M_incr(ptrdiff_t __i)
     {
+      _M_assume_normalized();
       difference_type __n = __i + _M_offset;
       _M_p += __n / int(_S_word_bit);
       __n = __n % int(_S_word_bit);
@@ -221,7 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD
     friend _GLIBCXX20_CONSTEXPR bool
     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
+    {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
+    }
 
 #if __cpp_lib_three_way_comparison
     [[nodiscard]]
@@ -229,6 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     noexcept
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
 	return __cmp;
       return __x._M_offset <=> __y._M_offset;
@@ -238,6 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend bool
     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return __x._M_p < __y._M_p
 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
     }
@@ -266,6 +289,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
     {
+      __x._M_assume_normalized();
+      __y._M_assume_normalized();
       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
 	      + __x._M_offset - __y._M_offset);
     }
@@ -297,7 +322,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     reference
     operator*() const
-    { return reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     iterator&
@@ -408,7 +436,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
     const_reference
     operator*() const
-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
+    {
+      _M_assume_normalized();
+      return _Bit_reference(_M_p, 1UL << _M_offset);
+    }
 
     _GLIBCXX20_CONSTEXPR
     const_iterator&


-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

  reply	other threads:[~2023-11-09  1:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 16:10 [PATCH] " Alexandre Oliva
2023-11-08 19:32 ` Jonathan Wakely
2023-11-09  1:17   ` Alexandre Oliva [this message]
2023-11-09  1:22     ` [PATCH v2] " Jonathan Wakely
2023-11-09  3:36       ` [PATCH v3] " Alexandre Oliva
2024-02-07 16:25       ` [PATCH v2] " Torbjorn SVENSSON
2024-02-07 16:36         ` Jonathan Wakely
2024-02-09  8:49           ` Torbjorn SVENSSON

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ormsvnhgj2.fsf_-_@lxoliva.fsfla.org \
    --to=oliva@adacore.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely@redhat.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).