From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by sourceware.org (Postfix) with ESMTPS id 3A3093856976 for ; Wed, 8 Nov 2023 16:10:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3A3093856976 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3A3093856976 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b2c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699459831; cv=none; b=bIGbEhSZf/5+mlE1hXaeHR1GEg46LGTf80bQK1r62aNiUs4PpgdqCPUE6zXyQwUcJz2RnrBo60tvkQG/HZdFEBoHrwvmZRjxowU/avKUZiPULixRak1MRX+6zocOV2HBavLfj2pUTyUluFESF0964TNjSC2Wb/IF0VPN4HfYk4o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699459831; c=relaxed/simple; bh=w4bxKR7RKVGM3HvqMqMa6ywscZYEHF4IFyMnurOz+fo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=BIl6Qi0pUDSirsbdKWGbNxC29pQD8CS+99j2WRNSsQGx0x1BzUhJ6XEvWw8LwrBugqioIaPR9zlhjGMIik4+RNDhREB8AfaGmiT723EwezCqvIlmBgjhoDQu7mR5zymUFduzGQDR7HyrX5Lv0+hVtkDScxQVtl1NiuIZtxud5eM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb2c.google.com with SMTP id 3f1490d57ef6-da041ffef81so7709698276.0 for ; Wed, 08 Nov 2023 08:10:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1699459828; x=1700064628; darn=gcc.gnu.org; h=mime-version:user-agent:message-id:date:organization:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=eo18PgVuJkk0MHT25YLScw7mubcGGZX9X3+16dJYtnI=; b=TW8pUVjD5+7M+5SRu9TtCB4cm6gCVxadA9GV6Ska58oW7SHafBmDEbVSpvA5MoroYM 5N76XFU0CXtQt4wa0hqnEE0zKbaUSOQRikTutZX2yTTe1+IEplQUBBL+ttlpJSOodwJS 2EoWdDeQhrFsjgo9nN7T67r2QSFHavCMmftV8QU1OBaUJJMW3sskE6ac4wPStW342oSa /njb6S5/C8pEPEyc3Zgy2/uALtD91xbl4TxUbztdWTIqcIm9Fb6QFhsL2bh1ea2GGyQ+ TqBjapiq5OJ23R4+AxlOovIyg3HJVLZpAu2aulJJ/rJ2ZRT7PQz+LSxVpIeFjDA2UxoV xb8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699459828; x=1700064628; h=mime-version:user-agent:message-id:date:organization:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eo18PgVuJkk0MHT25YLScw7mubcGGZX9X3+16dJYtnI=; b=mqCmZH402eJAAVgbD82ndy8GVs/+kyjU1GT8e31sBMZ7Ki0g+gc0NcplR+3WnYxOgs XBWUcE1E+uMaKqm5hzVBDgksNq5KLlbQijvB2O2+fSyH18TXTJcSPtavckWn1nr42bUk D7bc8EgvZMlTss6exuoHhiOm9NKLiX6cKITK8ayOXPJK2x6pwXDpDeowyXJKohilPtfw CNrU10vVd6dBfflgDp0LZIWlu2HWNF4d343vbV6bCU8o5tu5ciFx8V6x6rwDGn2CBS+p bb5Ov63OF8Sj79sEybOFEZO9NsSx2byZM5Nod7PLecXHKVJFHr85f0YGwh5RgdTF5Xey Q6pg== X-Gm-Message-State: AOJu0YxbxCXnjLZ2fkWObH3OKd9ml0nQJD4TwJ2eJFJyMhf+yO4uwYwM Hx1VyRGB0d5tbYOgiIOjgPJmwA== X-Google-Smtp-Source: AGHT+IHOteztVOmgZKK8kRgHcmsMLALid63AIEpT+keao+VjQZVm4u9Om36jgTyaGLY2RtwQNTX8rA== X-Received: by 2002:a25:7307:0:b0:d9a:dece:d9cb with SMTP id o7-20020a257307000000b00d9adeced9cbmr2013819ybc.54.1699459828295; Wed, 08 Nov 2023 08:10:28 -0800 (PST) Received: from free.home ([2804:7f1:2080:e9c8:ff5e:88e8:a900:d7b4]) by smtp.gmail.com with ESMTPSA id l130-20020a252588000000b00da0501b497asm6609959ybl.18.2023.11.08.08.10.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 08:10:27 -0800 (PST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 3A8GAIJl2215264 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 8 Nov 2023 13:10:18 -0300 From: Alexandre Oliva To: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Subject: [PATCH] libstdc++: optimize bit iterators assuming normalization [PR110807] Organization: Free thinker, does not speak for AdaCore Date: Wed, 08 Nov 2023 13:10:18 -0300 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX,WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The representation of bit iterators, using a pointer into an array of words, and an unsigned bit offset into that word, makes for some optimization challenges: because the compiler doesn't know that the offset is always in a certain narrow range, beginning at zero and ending before the word bitwidth, when a function loads an offset that it hasn't normalized itself, it may fail to derive certain reasonable conclusions, even to the point of retaining useless calls that elicit incorrect warnings. Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit list to a global bit vector variable. Based on the compile-time constant length of the list, we decide in _M_insert_range whether to use the existing storage or to allocate new storage for the vector. After allocation, we decide in _M_copy_aligned how to copy any preexisting portions of the vector to the newly-allocated storage. When copying two or more words, we use __builtin_memmove. However, because we compute the available room using bit offsets without range information, even comparing them with constants, we fail to infer ranges for the preexisting vector depending on word size, and may thus retain the memmove call despite knowing we've only allocated one word. Other parts of the compiler then detect the mismatch between the constant allocation size and the much larger range that could theoretically be copied into the newly-allocated storage if we could reach the call. Ensuring the compiler is aware of the constraints on the offset range enables it to do a much better job at optimizing. The challenge is to do so without runtime overhead, because this is not about checking that it's in range, it's only about telling the compiler about it. This patch introduces a __GLIBCXX_BUILTIN_ASSUME macro that, when optimizing, expands to code that invokes undefined behavior in case the expression doesn't hold, so that the compiler optimizes out the test and the entire branch containing, but retaining enough information about the paths that shouldn't be taken, so that at remaining paths it optimizes based on the assumption. I also introduce a member function in bit iterators that conveys to the compiler the information that the assumption is supposed to hold, and various calls throughout member functions of bit iterators that might not otherwise know that the offsets have to be in range, making pessimistic decisions and failing to optimize out cases that it could. With the explicit assumptions, the compiler can correlate the test for available storage in the vector with the test for how much storage might need to be copied, and determine that, if we're not asking for enough room for two or more words, we can omit entirely the code to copy two or more words, without any runtime overhead whatsoever: no traces remain of the undefined behavior or of the tests that inform the compiler about the assumptions that must hold. Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on i686- and x86_64-. Ok to install? (It was later found to fix 23_containers/vector/bool/allocator/copy_cc on x86_64-linux-gnu as well, that fails on gcc-13 with the same warning.) (The constant_evaluated/static_assert bit is disabled because expr is not a constant according to some libstdc++ build errors, but there doesn't seem to be a problem with the other bits. I haven't really thought that bit through, it was something I started out as potentially desirable, but that turned out to be not required. I suppose I could just drop it.) (I suppose __GLIBCXX_BUILTIN_ASSUME could be moved to a more general place and put to more general uses, but I didn't feel that bold ;-) for libstdc++-v3/ChangeLog PR libstdc++/110807 * include/bits/stl_bvector.h (__GLIBCXX_BUILTIN_ASSUME): New. (_Bit_iterator_base): Add _M_normalized_p and _M_assume_normalized. Use them in _M_bump_up, _M_bump_down, _M_incr, operator==, operator<=>, operator<, and operator-. (_Bit_iterator): Also use them in operator*. (_Bit_const_iterator): Likewise. --- libstdc++-v3/include/bits/stl_bvector.h | 75 ++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index 8d18bcaffd434..81b316846454b 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -177,6 +177,55 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _Bit_type * _M_p; unsigned int _M_offset; +#if __OPTIMIZE__ && !__GLIBCXX_DISABLE_ASSUMPTIONS +// If the assumption (EXPR) fails, invoke undefined behavior, so that +// the test and the failure block gets optimized out, but the compiler +// still recalls that (expr) can be taken for granted. Use this only +// for expressions that are simple and explicit enough that the +// compiler can optimize based on them. When not optimizing, the +// expression is still compiled, but it's never executed. +#if 0 /* ??? */ && __cpp_lib_is_constant_evaluated +#define __GLIBCXX_BUILTIN_ASSUME(expr) \ + do \ + if (std::is_constant_evaluated ()) \ + static_assert(expr); \ + else if (!(expr)) \ + { \ + void **__assert_failed = 0; \ + *__assert_failed = 0; \ + __builtin_unreachable (); \ + } \ + while (0) +#else +#define __GLIBCXX_BUILTIN_ASSUME(expr) \ + do \ + if (!(expr)) \ + { \ + void **__assert_failed = 0; \ + *__assert_failed = 0; \ + __builtin_unreachable (); \ + } \ + while (0) +#endif +#else +#define __GLIBCXX_BUILTIN_ASSUME(expr) \ + (void)(false && (expr)) +#endif + + _GLIBCXX20_CONSTEXPR + bool + _M_normalized_p() const + { + return (_M_offset < unsigned(_S_word_bit)); + } + + _GLIBCXX20_CONSTEXPR + void + _M_assume_normalized() const + { + __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ()); + } + _GLIBCXX20_CONSTEXPR _Bit_iterator_base(_Bit_type * __x, unsigned int __y) : _M_p(__x), _M_offset(__y) { } @@ -185,6 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void _M_bump_up() { + _M_assume_normalized(); if (_M_offset++ == int(_S_word_bit) - 1) { _M_offset = 0; @@ -196,6 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void _M_bump_down() { + _M_assume_normalized(); if (_M_offset-- == 0) { _M_offset = int(_S_word_bit) - 1; @@ -207,6 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void _M_incr(ptrdiff_t __i) { + _M_assume_normalized(); difference_type __n = __i + _M_offset; _M_p += __n / int(_S_word_bit); __n = __n % int(_S_word_bit); @@ -221,7 +273,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX_NODISCARD friend _GLIBCXX20_CONSTEXPR bool operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) - { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; } + { + __x._M_assume_normalized(); + __y._M_assume_normalized(); + return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; + } #if __cpp_lib_three_way_comparison [[nodiscard]] @@ -229,6 +285,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) noexcept { + __x._M_assume_normalized(); + __y._M_assume_normalized(); if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0) return __cmp; return __x._M_offset <=> __y._M_offset; @@ -238,6 +296,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER friend bool operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) { + __x._M_assume_normalized(); + __y._M_assume_normalized(); return __x._M_p < __y._M_p || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset); } @@ -266,6 +326,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER friend _GLIBCXX20_CONSTEXPR ptrdiff_t operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) { + // Make _M_offset's range visible to optimizers. + __x._M_assume_normalized(); + __y._M_assume_normalized(); return (int(_S_word_bit) * (__x._M_p - __y._M_p) + __x._M_offset - __y._M_offset); } @@ -297,7 +360,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR reference operator*() const - { return reference(_M_p, 1UL << _M_offset); } + { + _M_assume_normalized(); + return reference(_M_p, 1UL << _M_offset); + } _GLIBCXX20_CONSTEXPR iterator& @@ -408,7 +474,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR const_reference operator*() const - { return _Bit_reference(_M_p, 1UL << _M_offset); } + { + _M_assume_normalized(); + return _Bit_reference(_M_p, 1UL << _M_offset); + } _GLIBCXX20_CONSTEXPR const_iterator& -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive