From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=FR+C=GV=redhat.com=jwakely@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id D5DC73858D35
	for <gcc-patches@gcc.gnu.org>; Wed,  8 Nov 2023 19:32:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5DC73858D35
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D5DC73858D35
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699471967; cv=none;
	b=St/hVJcV/bkjN6Y8eULiOVraRxjImk7cGCQxGWkNVPqFe7l0r63eQfqe2Lzk/2fvwKyKqhCTU787k4g8QN3i3AgYoTPWgRjNBaFMD7BMWQgjxury8IGbTSVi4XXuGPpXd+3J9UGWkLE3dUzyEsODmUAcoaBs1TqSfZTHJeSIZb0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1699471967; c=relaxed/simple;
	bh=RhKo1RR8MTR/kdxeOBDTbsUWmGKoN2KvvxBXT3/BFxU=;
	h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=Idq2TYh6+/RjNeF3PpccmZvDGsRcNH7md8HKLOMJLtFUnS4o28IlLbe/5lJV8FZEtzwdTsHkr84IT2/Cx8Ho4sYQtndrMEKXTSgbjuJJ5/OXaN4OWlVB/mvuRycSqfNEHPg8OkB7vKcjgFZpgG7jdVDiwW476fJd0xXsyTPM3E0=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1699471965;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=yIsBHFHjZAR9NGugHQhQIyLUkbB9xmxUhHt5IqMKGB8=;
	b=Az2yG32SkWQaf5pEturg4L69Xshsad5pvO/on35CqSCBqHjZrL3FANAulMCyJjY4YjHnjK
	BY/pfLV9HQJyvfzhx04vrhR+tjVFX9uMYMaSfGSleuC0Zu546TbGFLrm+iR4FIHCg2a+Ox
	Sz8tqcyquaRuoqLIGq9H3/0vbIc62Ok=
Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73])
 by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-501-b5NxvKqePLWFk8yS49ZRWg-1; Wed,
 08 Nov 2023 14:32:43 -0500
X-MC-Unique: b5NxvKqePLWFk8yS49ZRWg-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0C8A028B72FD;
	Wed,  8 Nov 2023 19:32:43 +0000 (UTC)
Received: from localhost (unknown [10.42.28.155])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 9BAB61C060AE;
	Wed,  8 Nov 2023 19:32:42 +0000 (UTC)
Date: Wed, 8 Nov 2023 19:32:41 +0000
From: Jonathan Wakely <jwakely@redhat.com>
To: Alexandre Oliva <oliva@adacore.com>
Cc: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org
Subject: Re: [PATCH] libstdc++: optimize bit iterators assuming normalization
 [PR110807]
Message-ID: <ZUviWUHetjSC7/rD@redhat.com>
References: <orfs1gi5ud.fsf@lxoliva.fsfla.org>
MIME-Version: 1.0
In-Reply-To: <orfs1gi5ud.fsf@lxoliva.fsfla.org>
X-Clacks-Overhead: GNU Terry Pratchett
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX autolearn=unavailable autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 08/11/23 13:10 -0300, Alexandre Oliva wrote:
>The representation of bit iterators, using a pointer into an array of
>words, and an unsigned bit offset into that word, makes for some
>optimization challenges: because the compiler doesn't know that the
>offset is always in a certain narrow range, beginning at zero and
>ending before the word bitwidth, when a function loads an offset that
>it hasn't normalized itself, it may fail to derive certain reasonable
>conclusions, even to the point of retaining useless calls that elicit
>incorrect warnings.
>
>Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
>list to a global bit vector variable.  Based on the compile-time
>constant length of the list, we decide in _M_insert_range whether to
>use the existing storage or to allocate new storage for the vector.
>After allocation, we decide in _M_copy_aligned how to copy any
>preexisting portions of the vector to the newly-allocated storage.
>When copying two or more words, we use __builtin_memmove.
>
>However, because we compute the available room using bit offsets
>without range information, even comparing them with constants, we fail
>to infer ranges for the preexisting vector depending on word size, and
>may thus retain the memmove call despite knowing we've only allocated
>one word.
>
>Other parts of the compiler then detect the mismatch between the
>constant allocation size and the much larger range that could
>theoretically be copied into the newly-allocated storage if we could
>reach the call.
>
>Ensuring the compiler is aware of the constraints on the offset range
>enables it to do a much better job at optimizing.  The challenge is to
>do so without runtime overhead, because this is not about checking
>that it's in range, it's only about telling the compiler about it.
>
>This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when
>optimizing, expands to code that invokes undefined behavior in case
>the expression doesn't hold, so that the compiler optimizes out the
>test and the entire branch containing, but retaining enough
>information about the paths that shouldn't be taken, so that at
>remaining paths it optimizes based on the assumption.
>
>I also introduce a member function in bit iterators that conveys to
>the compiler the information that the assumption is supposed to hold,
>and various calls throughout member functions of bit iterators that
>might not otherwise know that the offsets have to be in range,
>making pessimistic decisions and failing to optimize out cases that it
>could.
>
>With the explicit assumptions, the compiler can correlate the test for
>available storage in the vector with the test for how much storage
>might need to be copied, and determine that, if we're not asking for
>enough room for two or more words, we can omit entirely the code to
>copy two or more words, without any runtime overhead whatsoever: no
>traces remain of the undefined behavior or of the tests that inform
>the compiler about the assumptions that must hold.
>
>Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and
>x86_64-.  Ok to install?
>
>(It was later found to fix 23_containers/vector/bool/allocator/copy_cc
>on x86_64-linux-gnu as well, that fails on gcc-13 with the same warning.)
>
>(The constant_evaluated/static_assert bit is disabled because expr is
>not a constant according to some libstdc++ build errors, but there
>doesn't seem to be a problem with the other bits.  I haven't really
>thought that bit through, it was something I started out as potentially
>desirable, but that turned out to be not required.  I suppose I could
>just drop it.)
>
>(I suppose __GLIBCXX_BUILTIN_ASSUME could be moved to a more general
>place and put to more general uses, but I didn't feel that bold ;-)
>
>
>for  libstdc++-v3/ChangeLog
>
>	PR libstdc++/110807
>	* include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New.
>	(_Bit_iterator_base): Add _M_normalized_p and
>	_M_assume_normalized.  Use them in _M_bump_up, _M_bump_down,
>	_M_incr, operator==, operator<=>, operator<, and operator-.
>	(_Bit_iterator): Also use them in operator*.
>	(_Bit_const_iterator): Likewise.
>---
> libstdc++-v3/include/bits/stl_bvector.h |   75 ++++++++++++++++++++++++++++++-
> 1 file changed, 72 insertions(+), 3 deletions(-)
>
>diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h
>index 8d18bcaffd434..81b316846454b 100644
>--- a/libstdc++-v3/include/bits/stl_bvector.h
>+++ b/libstdc++-v3/include/bits/stl_bvector.h
>@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     _Bit_type * _M_p;
>     unsigned int _M_offset;
>
>+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS
>+// If the assumption (EXPR) fails, invoke undefined behavior, so that
>+// the test and the failure block gets optimized out, but the compiler
>+// still recalls that (expr) can be taken for granted.  Use this only
>+// for expressions that are simple and explicit enough that the
>+// compiler can optimize based on them.  When not optimizing, the
>+// expression is still compiled, but it's never executed.
>+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated
>+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\

A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and
__GLIBCXX_DISABLE_ASSUMPTIONS please.

>+    do						\
>+      if (std::is_constant_evaluated ())	\
>+	static_assert(expr);			\

This can never be valid. The true branch of the
if (is_constant_evaluated) statement must always compile, which means
expr must be a valid constant expression to be usable in a
static_assert. It can't be a runtime condition like
"_M_offset < _S_word_bit".


>+      else if (!(expr))				\
>+	{					\
>+	  void **__assert_failed = 0;		\
>+	  *__assert_failed = 0;			\
>+	  __builtin_unreachable ();		\

This already works fine in constant evaluation anyway. If a
__builtin_unreachable() is reached during constant evaluation, the
compilation fails. So this already serves as static_assert(expr),
except it actually works :-)

But what's the null dereference for?

>+	}					\
>+    while (0)
>+#else
>+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
>+    do						\
>+      if (!(expr))				\
>+	{					\
>+	  void **__assert_failed = 0;		\
>+	  *__assert_failed = 0;			\
>+	  __builtin_unreachable ();		\
>+	}					\
>+    while (0)
>+#endif
>+#else
>+#define __GLIBCXX_BUILTIN_ASSUME(expr)		\
>+    (void)(false && (expr))

What's the point of this, just to verify that (expr) is contextually
convertible to bool?

>+#endif
>+
>+    _GLIBCXX20_CONSTEXPR
>+    bool
>+    _M_normalized_p() const

We don't use the _p suffix for predicates in the library.
Please use just _M_normalized or _M_is_normalized.

But do we even need this function? It's not used anywhere else, can we
just inline the condition into _M_assume_normalized() ?


>+    {
>+      return (_M_offset < unsigned(_S_word_bit));
>+    }
>+
>+    _GLIBCXX20_CONSTEXPR
>+    void
>+    _M_assume_normalized() const

I think this should use _GLIBCXX_ALWAYS_INLINE

>+    {
>+      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());

Is there even any benefit to this macro?

Can we just define this function as:

         if (_M_offset >= unsigned(_S_word_bit))
           __builtin_unreachable();

Or better still:

        __attribute__((__assume__(_M_offset < unsigned(_S_word_bit))));

Maybe even get rid of _M_assume_normalized() as a function and just
put that attribute everywhere you currently use _M_assume_normalized.


>+    }
>+
>     _GLIBCXX20_CONSTEXPR
>     _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
>     : _M_p(__x), _M_offset(__y) { }
>@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     void
>     _M_bump_up()
>     {
>+      _M_assume_normalized();
>       if (_M_offset++ == int(_S_word_bit) - 1)
> 	{
> 	  _M_offset = 0;
>@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     void
>     _M_bump_down()
>     {
>+      _M_assume_normalized();
>       if (_M_offset-- == 0)
> 	{
> 	  _M_offset = int(_S_word_bit) - 1;
>@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     void
>     _M_incr(ptrdiff_t __i)
>     {
>+      _M_assume_normalized();
>       difference_type __n = __i + _M_offset;
>       _M_p += __n / int(_S_word_bit);
>       __n = __n % int(_S_word_bit);
>@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     _GLIBCXX_NODISCARD
>     friend _GLIBCXX20_CONSTEXPR bool
>     operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
>-    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
>+    {
>+      __x._M_assume_normalized();
>+      __y._M_assume_normalized();
>+      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
>+    }
>
> #if __cpp_lib_three_way_comparison
>     [[nodiscard]]
>@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
>     noexcept
>     {
>+      __x._M_assume_normalized();
>+      __y._M_assume_normalized();
>       if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
> 	return __cmp;
>       return __x._M_offset <=> __y._M_offset;
>@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     friend bool
>     operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
>     {
>+      __x._M_assume_normalized();
>+      __y._M_assume_normalized();
>       return __x._M_p < __y._M_p
> 	    || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
>     }
>@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     friend _GLIBCXX20_CONSTEXPR ptrdiff_t
>     operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y)
>     {
>+      // Make _M_offset's range visible to optimizers.
>+      __x._M_assume_normalized();
>+      __y._M_assume_normalized();
>       return (int(_S_word_bit) * (__x._M_p - __y._M_p)
> 	      + __x._M_offset - __y._M_offset);
>     }
>@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>     reference
>     operator*() const
>-    { return reference(_M_p, 1UL << _M_offset); }
>+    {
>+      _M_assume_normalized();
>+      return reference(_M_p, 1UL << _M_offset);
>+    }
>
>     _GLIBCXX20_CONSTEXPR
>     iterator&
>@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>     _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>     const_reference
>     operator*() const
>-    { return _Bit_reference(_M_p, 1UL << _M_offset); }
>+    {
>+      _M_assume_normalized();
>+      return _Bit_reference(_M_p, 1UL << _M_offset);
>+    }
>
>     _GLIBCXX20_CONSTEXPR
>     const_iterator&