From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id CBD3538582BC; Thu, 9 Nov 2023 01:22:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBD3538582BC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CBD3538582BC Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699492951; cv=none; b=AmW7dMVLdk8caZalkJ7gr2OwbsJMRhmKbxIre1rD5dSdFZjrfCgClmOtYKhQdjy2YulrcYpPsB5BKPVpnAAwWjQc+wiWl+xuTDBUVUAFAYnHZjQcWZGPKqEGANombXI59YpFDlWHZirvrIyK9qm+Z+S6oe4oon66AUzVMTCIPbQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699492951; c=relaxed/simple; bh=lKEYpOSZyJcPNF46Oc3sBfRoupQVBeziPukd38OjkiU=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=uw+GnAvn/zqRdwnOquy2VF3KQxcnrgjDdQOe7I2WOirOiHB4Ur3m9GiqhdCoR6m5EamkZKO6/EbKx/4lWH1aKTLc7QaUoosRJolVz62803Tjxyh6/xDH3MjRcmFUfpPWIMLgNOSzqPf33hQZh05oSUAs7dh+VQn45ndWwFaZSA8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-53de0d1dc46so413339a12.3; Wed, 08 Nov 2023 17:22:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699492946; x=1700097746; darn=gcc.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=z71neY7UknQHYqnfwnM3+wYtfmswzxfyTSUclrtxhQs=; b=lPmyd4o7pGeRTqUm9lfYwdLu9qxrUtZGpp1fq5/ejC8N3XHoBb5rlrukFIeHULREXg 5U0K6hAwaNNuB9oNuEYRRB69r8fuWBfTL0rqjpjsFG3+98wejhVLj1WK+PwsRxr6rOgO X20GpEXE2UNu6Lbm1sW+YDGXuKhd27gNFhlFxVQxArPRJ/CYw0BhXIuOlJsbUqqJHm07 DRFq0oO8XJQ5EXBfJExBRZ4712tk0zmP629vn2ofWF6DV799Ls4zHSNBSCVYcYusCunk RBA5gqAWwJ4MTtMIGFRJDlBsggQJz7mba/0ipiV2prk3sg/o95QVe+IsadIYrjXwYjzm 2Z4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699492946; x=1700097746; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=z71neY7UknQHYqnfwnM3+wYtfmswzxfyTSUclrtxhQs=; b=F+e7BkHhSL9fsvybV9RWUdc48jVwyM/91FIh0+coraGHgbIs0mpA7sAF03ASOfZ24j NhLP+b/ZRCw3XQ0wxJ6QM8jXsbwiku0NaEzsrNxAXXevJ7z88nLSlGIVdTZIM3QpTU5W FNHezo41zI6JZdJgetKajYkqfEvYk7W4PWumhO3QoFSSj5rFGStQDtgnKLeGwpUoaiiQ Oe5ZUkWyAR7NSaOLfT8B/MBDkQdgbFefv2v7VzIrkAXtFBOh75h1hV/wDiWkrnkzSn3u 1uzdUS9yca4fYIvWb0WC11Ji4z1DWNZw0dlkLAELAVoQ6Vzxh4GHjNMvIFuxpFUrgaKP 72WQ== X-Gm-Message-State: AOJu0YwAUAu4QvSMaiEH1UCDkzWqMm/ioEJU0pGa0kqbSl5Fgw+0suYN sVD8WwFFRiASbGUIG992NXgKdHgmteX+mF9wcw8= X-Google-Smtp-Source: AGHT+IG37Yuzkvi7eUumvtH4Ie9hW1enoLzyWgW11EN7zT0bovu0xSYKgPfVxHQwlgCcBkbmInBFsA0E8xciCnk9FJM= X-Received: by 2002:a17:907:3f0e:b0:9a4:88af:b7b with SMTP id hq14-20020a1709073f0e00b009a488af0b7bmr2984223ejc.62.1699492946220; Wed, 08 Nov 2023 17:22:26 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jonathan Wakely Date: Thu, 9 Nov 2023 01:22:13 +0000 Message-ID: Subject: Re: [PATCH v2] libstdc++: optimize bit iterators assuming normalization [PR110807] To: Alexandre Oliva Cc: Jonathan Wakely , gcc-patches , "libstdc++" Content-Type: multipart/alternative; boundary="0000000000005b9abe0609ae0934" X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX,WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000005b9abe0609ae0934 Content-Type: text/plain; charset="UTF-8" On Thu, 9 Nov 2023, 01:17 Alexandre Oliva, wrote: > On Nov 8, 2023, Jonathan Wakely wrote: > > > A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and > > __GLIBCXX_DISABLE_ASSUMPTIONS please. > > That's entirely gone now. > > >> + do \ > >> + if (std::is_constant_evaluated ()) \ > >> + static_assert(expr); \ > > > This can never be valid. > > *nod* > > > This already works fine in constant evaluation anyway. > > Yeah, that's what I figured. > > > But what's the null dereference for? > > The idea was to clearly trigger undefined behavior. Maybe it wasn't > needed, it didn't occur to me that __builtin_unreachable() would be > enough. I realize I was really trying to emulate attribute assume, even > without knowing it existed ;-) > > >> +#define __GLIBCXX_BUILTIN_ASSUME(expr) \ > >> + (void)(false && (expr)) > > > What's the point of this, just to verify that (expr) is contextually > > convertible to bool? > > I'd have phrased it as "avoid the case in which something compiles with > -O0 but not with -O", but yeah ;-) > > > We don't use the _p suffix for predicates in the library. > > Please use just _M_normalized or _M_is_normalized. > > ACK. It's also gone now. > > > But do we even need this function? It's not used anywhere else, can we > > just inline the condition into _M_assume_normalized() ? > > I had other uses for it in earlier versions of the patch, but it makes > no sense any more indeed. > > >> + _GLIBCXX20_CONSTEXPR > >> + void > >> + _M_assume_normalized() const > > > I think this should use _GLIBCXX_ALWAYS_INLINE > > *nod*, thanks > > >> + { > >> + __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ()); > > > Is there even any benefit to this macro? > > I just thought it could have other uses, without being aware that the > entire concept was available as a statement attribute. Funny, I'd even > searched for it among the existing attributes and builtins, but somehow > I managed to miss it. Thanks for getting me back down that path. > > > __attribute__((__assume__(_M_offset < unsigned(_S_word_bit)))); > > That unfortunately doesn't work, because the assume lowering doesn't go > as far as dereferencing the implicit this and making an SSA_NAME out of > the loaded _M_offset, which we'd need to be able to optimize based on > it. But that only took me a while to figure out and massage into > something that had the desired effect. Now, maybe the above *should* > have that effect already, but unfortunately it doesn't. > > > Maybe even get rid of _M_assume_normalized() as a function and just > > put that attribute everywhere you currently use _M_assume_normalized. > > Because of the slight kludge required to make the attribute have the > desired effect (namely ensuring the _M_offset reference is evaluated), > I've retained it as an inline function. > > Here's what I'm retesting now. WDYT? > ofst needs to be __ofst but OK for trunk with that change. We probably want this on the gcc-13 branch too, but let's give it some time on trunk in case the assume attribute isn't quite ready for prime time. > > libstdc++: optimize bit iterators assuming normalization [PR110807] > > The representation of bit iterators, using a pointer into an array of > words, and an unsigned bit offset into that word, makes for some > optimization challenges: because the compiler doesn't know that the > offset is always in a certain narrow range, beginning at zero and > ending before the word bitwidth, when a function loads an offset that > it hasn't normalized itself, it may fail to derive certain reasonable > conclusions, even to the point of retaining useless calls that elicit > incorrect warnings. > > Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit > list to a global bit vector variable. Based on the compile-time > constant length of the list, we decide in _M_insert_range whether to > use the existing storage or to allocate new storage for the vector. > After allocation, we decide in _M_copy_aligned how to copy any > preexisting portions of the vector to the newly-allocated storage. > When copying two or more words, we use __builtin_memmove. > > However, because we compute the available room using bit offsets > without range information, even comparing them with constants, we fail > to infer ranges for the preexisting vector depending on word size, and > may thus retain the memmove call despite knowing we've only allocated > one word. > > Other parts of the compiler then detect the mismatch between the > constant allocation size and the much larger range that could > theoretically be copied into the newly-allocated storage if we could > reach the call. > > Ensuring the compiler is aware of the constraints on the offset range > enables it to do a much better job at optimizing. Using attribute > assume (_M_offset <= ...) didn't work, because gimple lowered that to > something that vrp could only use to ensure 'this' was non-NULL. > Exposing _M_offset as an automatic variable/gimple register outside > the unevaluated assume operand enabled the optimizer to do its job. > > Rather than placing such load-then-assume constructs all over, I > introduced an always-inline member function in bit iterators that does > the job of conveying to the compiler the information that the > assumption is supposed to hold, and various calls throughout functions > pertaining to bit iterators that might not otherwise know that the > offsets have to be in range, so that the compiler no longer needs to > make conservative assumptions that prevent optimizations. > > With the explicit assumptions, the compiler can correlate the test for > available storage in the vector with the test for how much storage > might need to be copied, and determine that, if we're not asking for > enough room for two or more words, we can omit entirely the code to > copy two or more words, without any runtime overhead whatsoever: no > traces remain of the undefined behavior or of the tests that inform > the compiler about the assumptions that must hold. > > > for libstdc++-v3/ChangeLog > > PR libstdc++/110807 > * include/bits/stl_bvector.h (_Bit_iterator_base): Add > _M_assume_normalized member function. Call it in _M_bump_up, > _M_bump_down, _M_incr, operator==, operator<=>, operator<, and > operator-. > (_Bit_iterator): Also call it in operator*. > (_Bit_const_iterator): Likewise. > --- > libstdc++-v3/include/bits/stl_bvector.h | 37 > ++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/libstdc++-v3/include/bits/stl_bvector.h > b/libstdc++-v3/include/bits/stl_bvector.h > index 8d18bcaffd434..bb9f38cdf2d59 100644 > --- a/libstdc++-v3/include/bits/stl_bvector.h > +++ b/libstdc++-v3/include/bits/stl_bvector.h > @@ -56,6 +56,10 @@ > #ifndef _STL_BVECTOR_H > #define _STL_BVECTOR_H 1 > > +#ifndef _GLIBCXX_ALWAYS_INLINE > +#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__)) > +#endif > + > #if __cplusplus >= 201103L > #include > #include > @@ -177,6 +181,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _Bit_type * _M_p; > unsigned int _M_offset; > > + _GLIBCXX20_CONSTEXPR _GLIBCXX_ALWAYS_INLINE > + void > + _M_assume_normalized() const > + { > + unsigned int ofst = _M_offset; > + __attribute__ ((__assume__ (ofst < unsigned(_S_word_bit)))); > + } > + > _GLIBCXX20_CONSTEXPR > _Bit_iterator_base(_Bit_type * __x, unsigned int __y) > : _M_p(__x), _M_offset(__y) { } > @@ -185,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > void > _M_bump_up() > { > + _M_assume_normalized(); > if (_M_offset++ == int(_S_word_bit) - 1) > { > _M_offset = 0; > @@ -196,6 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > void > _M_bump_down() > { > + _M_assume_normalized(); > if (_M_offset-- == 0) > { > _M_offset = int(_S_word_bit) - 1; > @@ -207,6 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > void > _M_incr(ptrdiff_t __i) > { > + _M_assume_normalized(); > difference_type __n = __i + _M_offset; > _M_p += __n / int(_S_word_bit); > __n = __n % int(_S_word_bit); > @@ -221,7 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _GLIBCXX_NODISCARD > friend _GLIBCXX20_CONSTEXPR bool > operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& > __y) > - { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; } > + { > + __x._M_assume_normalized(); > + __y._M_assume_normalized(); > + return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; > + } > > #if __cpp_lib_three_way_comparison > [[nodiscard]] > @@ -229,6 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& > __y) > noexcept > { > + __x._M_assume_normalized(); > + __y._M_assume_normalized(); > if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0) > return __cmp; > return __x._M_offset <=> __y._M_offset; > @@ -238,6 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > friend bool > operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& > __y) > { > + __x._M_assume_normalized(); > + __y._M_assume_normalized(); > return __x._M_p < __y._M_p > || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset); > } > @@ -266,6 +289,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > friend _GLIBCXX20_CONSTEXPR ptrdiff_t > operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& > __y) > { > + __x._M_assume_normalized(); > + __y._M_assume_normalized(); > return (int(_S_word_bit) * (__x._M_p - __y._M_p) > + __x._M_offset - __y._M_offset); > } > @@ -297,7 +322,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR > reference > operator*() const > - { return reference(_M_p, 1UL << _M_offset); } > + { > + _M_assume_normalized(); > + return reference(_M_p, 1UL << _M_offset); > + } > > _GLIBCXX20_CONSTEXPR > iterator& > @@ -408,7 +436,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR > const_reference > operator*() const > - { return _Bit_reference(_M_p, 1UL << _M_offset); } > + { > + _M_assume_normalized(); > + return _Bit_reference(_M_p, 1UL << _M_offset); > + } > > _GLIBCXX20_CONSTEXPR > const_iterator& > > > -- > Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ > Free Software Activist GNU Toolchain Engineer > More tolerance and less prejudice are key for inclusion and diversity > Excluding neuro-others for not behaving ""normal"" is *not* inclusive > --0000000000005b9abe0609ae0934--