From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id D5DC73858D35 for ; Wed, 8 Nov 2023 19:32:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5DC73858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D5DC73858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699471967; cv=none; b=St/hVJcV/bkjN6Y8eULiOVraRxjImk7cGCQxGWkNVPqFe7l0r63eQfqe2Lzk/2fvwKyKqhCTU787k4g8QN3i3AgYoTPWgRjNBaFMD7BMWQgjxury8IGbTSVi4XXuGPpXd+3J9UGWkLE3dUzyEsODmUAcoaBs1TqSfZTHJeSIZb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699471967; c=relaxed/simple; bh=RhKo1RR8MTR/kdxeOBDTbsUWmGKoN2KvvxBXT3/BFxU=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=Idq2TYh6+/RjNeF3PpccmZvDGsRcNH7md8HKLOMJLtFUnS4o28IlLbe/5lJV8FZEtzwdTsHkr84IT2/Cx8Ho4sYQtndrMEKXTSgbjuJJ5/OXaN4OWlVB/mvuRycSqfNEHPg8OkB7vKcjgFZpgG7jdVDiwW476fJd0xXsyTPM3E0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699471965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yIsBHFHjZAR9NGugHQhQIyLUkbB9xmxUhHt5IqMKGB8=; b=Az2yG32SkWQaf5pEturg4L69Xshsad5pvO/on35CqSCBqHjZrL3FANAulMCyJjY4YjHnjK BY/pfLV9HQJyvfzhx04vrhR+tjVFX9uMYMaSfGSleuC0Zu546TbGFLrm+iR4FIHCg2a+Ox Sz8tqcyquaRuoqLIGq9H3/0vbIc62Ok= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-501-b5NxvKqePLWFk8yS49ZRWg-1; Wed, 08 Nov 2023 14:32:43 -0500 X-MC-Unique: b5NxvKqePLWFk8yS49ZRWg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0C8A028B72FD; Wed, 8 Nov 2023 19:32:43 +0000 (UTC) Received: from localhost (unknown [10.42.28.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9BAB61C060AE; Wed, 8 Nov 2023 19:32:42 +0000 (UTC) Date: Wed, 8 Nov 2023 19:32:41 +0000 From: Jonathan Wakely To: Alexandre Oliva Cc: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Subject: Re: [PATCH] libstdc++: optimize bit iterators assuming normalization [PR110807] Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 08/11/23 13:10 -0300, Alexandre Oliva wrote: >The representation of bit iterators, using a pointer into an array of >words, and an unsigned bit offset into that word, makes for some >optimization challenges: because the compiler doesn't know that the >offset is always in a certain narrow range, beginning at zero and >ending before the word bitwidth, when a function loads an offset that >it hasn't normalized itself, it may fail to derive certain reasonable >conclusions, even to the point of retaining useless calls that elicit >incorrect warnings. > >Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit >list to a global bit vector variable. Based on the compile-time >constant length of the list, we decide in _M_insert_range whether to >use the existing storage or to allocate new storage for the vector. >After allocation, we decide in _M_copy_aligned how to copy any >preexisting portions of the vector to the newly-allocated storage. >When copying two or more words, we use __builtin_memmove. > >However, because we compute the available room using bit offsets >without range information, even comparing them with constants, we fail >to infer ranges for the preexisting vector depending on word size, and >may thus retain the memmove call despite knowing we've only allocated >one word. > >Other parts of the compiler then detect the mismatch between the >constant allocation size and the much larger range that could >theoretically be copied into the newly-allocated storage if we could >reach the call. > >Ensuring the compiler is aware of the constraints on the offset range >enables it to do a much better job at optimizing. The challenge is to >do so without runtime overhead, because this is not about checking >that it's in range, it's only about telling the compiler about it. > >This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when >optimizing, expands to code that invokes undefined behavior in case >the expression doesn't hold, so that the compiler optimizes out the >test and the entire branch containing, but retaining enough >information about the paths that shouldn't be taken, so that at >remaining paths it optimizes based on the assumption. > >I also introduce a member function in bit iterators that conveys to >the compiler the information that the assumption is supposed to hold, >and various calls throughout member functions of bit iterators that >might not otherwise know that the offsets have to be in range, >making pessimistic decisions and failing to optimize out cases that it >could. > >With the explicit assumptions, the compiler can correlate the test for >available storage in the vector with the test for how much storage >might need to be copied, and determine that, if we're not asking for >enough room for two or more words, we can omit entirely the code to >copy two or more words, without any runtime overhead whatsoever: no >traces remain of the undefined behavior or of the tests that inform >the compiler about the assumptions that must hold. > >Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and >x86_64-. Ok to install? > >(It was later found to fix 23_containers/vector/bool/allocator/copy_cc >on x86_64-linux-gnu as well, that fails on gcc-13 with the same warning.) > >(The constant_evaluated/static_assert bit is disabled because expr is >not a constant according to some libstdc++ build errors, but there >doesn't seem to be a problem with the other bits. I haven't really >thought that bit through, it was something I started out as potentially >desirable, but that turned out to be not required. I suppose I could >just drop it.) > >(I suppose __GLIBCXX_BUILTIN_ASSUME could be moved to a more general >place and put to more general uses, but I didn't feel that bold ;-) > > >for libstdc++-v3/ChangeLog > > PR libstdc++/110807 > * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New. > (_Bit_iterator_base): Add _M_normalized_p and > _M_assume_normalized. Use them in _M_bump_up, _M_bump_down, > _M_incr, operator==, operator<=>, operator<, and operator-. > (_Bit_iterator): Also use them in operator*. > (_Bit_const_iterator): Likewise. >--- > libstdc++-v3/include/bits/stl_bvector.h | 75 ++++++++++++++++++++++++++++++- > 1 file changed, 72 insertions(+), 3 deletions(-) > >diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h >index 8d18bcaffd434..81b316846454b 100644 >--- a/libstdc++-v3/include/bits/stl_bvector.h >+++ b/libstdc++-v3/include/bits/stl_bvector.h >@@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _Bit_type * _M_p; > unsigned int _M_offset; > >+#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS >+// If the assumption (EXPR) fails, invoke undefined behavior, so that >+// the test and the failure block gets optimized out, but the compiler >+// still recalls that (expr) can be taken for granted. Use this only >+// for expressions that are simple and explicit enough that the >+// compiler can optimize based on them. When not optimizing, the >+// expression is still compiled, but it's never executed. >+#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated >+#define __GLIBCXX_BUILTIN_ASSUME(expr) \ A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and __GLIBCXX_DISABLE_ASSUMPTIONS please. >+ do \ >+ if (std::is_constant_evaluated ()) \ >+ static_assert(expr); \ This can never be valid. The true branch of the if (is_constant_evaluated) statement must always compile, which means expr must be a valid constant expression to be usable in a static_assert. It can't be a runtime condition like "_M_offset < _S_word_bit". >+ else if (!(expr)) \ >+ { \ >+ void **__assert_failed = 0; \ >+ *__assert_failed = 0; \ >+ __builtin_unreachable (); \ This already works fine in constant evaluation anyway. If a __builtin_unreachable() is reached during constant evaluation, the compilation fails. So this already serves as static_assert(expr), except it actually works :-) But what's the null dereference for? >+ } \ >+ while (0) >+#else >+#define __GLIBCXX_BUILTIN_ASSUME(expr) \ >+ do \ >+ if (!(expr)) \ >+ { \ >+ void **__assert_failed = 0; \ >+ *__assert_failed = 0; \ >+ __builtin_unreachable (); \ >+ } \ >+ while (0) >+#endif >+#else >+#define __GLIBCXX_BUILTIN_ASSUME(expr) \ >+ (void)(false && (expr)) What's the point of this, just to verify that (expr) is contextually convertible to bool? >+#endif >+ >+ _GLIBCXX20_CONSTEXPR >+ bool >+ _M_normalized_p() const We don't use the _p suffix for predicates in the library. Please use just _M_normalized or _M_is_normalized. But do we even need this function? It's not used anywhere else, can we just inline the condition into _M_assume_normalized() ? >+ { >+ return (_M_offset < unsigned(_S_word_bit)); >+ } >+ >+ _GLIBCXX20_CONSTEXPR >+ void >+ _M_assume_normalized() const I think this should use _GLIBCXX_ALWAYS_INLINE >+ { >+ __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ()); Is there even any benefit to this macro? Can we just define this function as: if (_M_offset >= unsigned(_S_word_bit)) __builtin_unreachable(); Or better still: __attribute__((__assume__(_M_offset < unsigned(_S_word_bit)))); Maybe even get rid of _M_assume_normalized() as a function and just put that attribute everywhere you currently use _M_assume_normalized. >+ } >+ > _GLIBCXX20_CONSTEXPR > _Bit_iterator_base(_Bit_type * __x, unsigned int __y) > : _M_p(__x), _M_offset(__y) { } >@@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > void > _M_bump_up() > { >+ _M_assume_normalized(); > if (_M_offset++ == int(_S_word_bit) - 1) > { > _M_offset = 0; >@@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > void > _M_bump_down() > { >+ _M_assume_normalized(); > if (_M_offset-- == 0) > { > _M_offset = int(_S_word_bit) - 1; >@@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > void > _M_incr(ptrdiff_t __i) > { >+ _M_assume_normalized(); > difference_type __n = __i + _M_offset; > _M_p += __n / int(_S_word_bit); > __n = __n % int(_S_word_bit); >@@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _GLIBCXX_NODISCARD > friend _GLIBCXX20_CONSTEXPR bool > operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) >- { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; } >+ { >+ __x._M_assume_normalized(); >+ __y._M_assume_normalized(); >+ return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; >+ } > > #if __cpp_lib_three_way_comparison > [[nodiscard]] >@@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) > noexcept > { >+ __x._M_assume_normalized(); >+ __y._M_assume_normalized(); > if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0) > return __cmp; > return __x._M_offset <=> __y._M_offset; >@@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > friend bool > operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) > { >+ __x._M_assume_normalized(); >+ __y._M_assume_normalized(); > return __x._M_p < __y._M_p > || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset); > } >@@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > friend _GLIBCXX20_CONSTEXPR ptrdiff_t > operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) > { >+ // Make _M_offset's range visible to optimizers. >+ __x._M_assume_normalized(); >+ __y._M_assume_normalized(); > return (int(_S_word_bit) * (__x._M_p - __y._M_p) > + __x._M_offset - __y._M_offset); > } >@@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR > reference > operator*() const >- { return reference(_M_p, 1UL << _M_offset); } >+ { >+ _M_assume_normalized(); >+ return reference(_M_p, 1UL << _M_offset); >+ } > > _GLIBCXX20_CONSTEXPR > iterator& >@@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR > const_reference > operator*() const >- { return _Bit_reference(_M_p, 1UL << _M_offset); } >+ { >+ _M_assume_normalized(); >+ return _Bit_reference(_M_p, 1UL << _M_offset); >+ } > > _GLIBCXX20_CONSTEXPR > const_iterator&