From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by sourceware.org (Postfix) with ESMTPS id 56E6E3858D33 for ; Thu, 9 Nov 2023 03:36:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 56E6E3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 56E6E3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699501017; cv=none; b=oUKdxw8tICn5FHkX0y/kZ1NvpsJxiJXj3UqCgy6HxFInhxFl8AKG22RWWbAOrZZpEGlL/BqJFme12QG4HnZ9k2KsdLSIAiNJFjSdpgudK+ci6354ClAG8sMe7J/aCJubVsFnz2QyZhd2GnJ9L2fv7077cqYvJ5RaWBKCclk5Ggk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699501017; c=relaxed/simple; bh=+JlD8hC5SP5WXaL7ZaDTYZBX1BqTY3jZfqfa/0rfB8c=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Rse+ZWybcw2QQcqN/AmM+oQbt+u8O+1DqtRc3++KQd42Bek8fQ3QH9aLkjwVcM+eOKd5l+MCFCTJWWBShS7t7G+wxQoRSmB4LEWQE/hkwXzkn/tdiuGUPRolHP4NgDSRzo6qdlezPnn9febmT2n5mWMwbA5kpLagp9oSmP3/8v8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-5a7dd65052aso5479927b3.0 for ; Wed, 08 Nov 2023 19:36:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1699501014; x=1700105814; darn=gcc.gnu.org; h=mime-version:user-agent:message-id:in-reply-to:date:references :organization:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=mOt7HJVHpOSODdFboENpNtylBlx+W+F4emTmLrR9nSs=; b=Zem0C3TlUk85KNy6ilEf3bQZlGysVoaRlli30GtDghggPxfc/6fF+FMblmt9tYz6IZ DTuPJiY3Rt4Ul/VwxDImodQ67Pavj+C9JPUra2XRuwEkJ0Of/2KRBuB+Fs2W6Dz8nka1 BBOxEJtR0MOg0chXakL1hkK4/hkSayQW6y5zsvDciF5uIrdmjZH24gpVBWFXpZbEYOCC S4/gf38uojnNwbANybGJeihS1ivv0ok7XPLavHbYQOhnL2InwZC1C4dByIHU2gx2UZDe SezVOJbQJ8cR9l7coUPOmHUQGUkt5phJMzPfXhFwfFg6olosLUszCIVweUcYiv4D7ucM SfHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699501014; x=1700105814; h=mime-version:user-agent:message-id:in-reply-to:date:references :organization:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mOt7HJVHpOSODdFboENpNtylBlx+W+F4emTmLrR9nSs=; b=hQfFM6LppURgltOKgBEd3gLmfUHogVVM/2VjDqDMNHURnbBz0RePH/Cs/lzqW04p18 TYCyOQj0j0QOoOJNK8wrcahCUtWZybiOryZymw1QV3if1FCUg/6NgARfmmBLdzSlbt5l RaiOlKrnzi5A6fJS0l/6PCNvXdJmUZXsSdkHoo0npRP2XkJ5XjlRaBvCQLXpfmfIY9uK XbVXCIGrcz8s9odmMVeAaFrpVtqBsjQzSf2VMKc1BWL5G9BEf/CL5/W9LC4N2etCwy6P 5VWgptarJM407NdVwmT9JLBUtIAW7Oom9PhWKH2uyeXbKXjoauWJAn95i1VW/2epIFzd 4G0w== X-Gm-Message-State: AOJu0Yzt+VWbcEBMkq2kU2c1UXObK2wJvIJaSYclc1Dbb55urYKflbZ8 KSsITYTID8HqC1V4P+lMZVBtIA== X-Google-Smtp-Source: AGHT+IHzAQGlEEDNs3jgZRyZ9H9xsA/ncIe0NpSsFcO637LDdzFeVR06R48MiICoEA3NLMR59TvMOg== X-Received: by 2002:a0d:c7c6:0:b0:5af:73cd:ba30 with SMTP id j189-20020a0dc7c6000000b005af73cdba30mr3861202ywd.39.1699501013734; Wed, 08 Nov 2023 19:36:53 -0800 (PST) Received: from free.home ([2804:7f1:2080:e9c8:ff5e:88e8:a900:d7b4]) by smtp.gmail.com with ESMTPSA id i187-20020a816dc4000000b005a7cc149e3asm7693794ywc.2.2023.11.08.19.36.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 19:36:53 -0800 (PST) Received: from livre (livre.home [172.31.160.2]) by free.home (8.15.2/8.15.2) with ESMTPS id 3A93ae4d2235159 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 9 Nov 2023 00:36:41 -0300 From: Alexandre Oliva To: Jonathan Wakely Cc: Jonathan Wakely , gcc-patches , "libstdc++" Subject: [PATCH v3] libstdc++: optimize bit iterators assuming normalization [PR110807] Organization: Free thinker, does not speak for AdaCore References: Date: Thu, 09 Nov 2023 00:36:40 -0300 In-Reply-To: (Jonathan Wakely's message of "Thu, 9 Nov 2023 01:22:13 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX,WEIRD_QUOTING autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Nov 8, 2023, Jonathan Wakely wrote: > ofst needs to be __ofst but OK for trunk with that change. Oh, doh, thanks for catching that last-minute tweak. Retesting with that change completed successfully, so I've just pushed the following: libstdc++: optimize bit iterators assuming normalization [PR110807] The representation of bit iterators, using a pointer into an array of words, and an unsigned bit offset into that word, makes for some optimization challenges: because the compiler doesn't know that the offset is always in a certain narrow range, beginning at zero and ending before the word bitwidth, when a function loads an offset that it hasn't normalized itself, it may fail to derive certain reasonable conclusions, even to the point of retaining useless calls that elicit incorrect warnings. Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit list to a global bit vector variable. Based on the compile-time constant length of the list, we decide in _M_insert_range whether to use the existing storage or to allocate new storage for the vector. After allocation, we decide in _M_copy_aligned how to copy any preexisting portions of the vector to the newly-allocated storage. When copying two or more words, we use __builtin_memmove. However, because we compute the available room using bit offsets without range information, even comparing them with constants, we fail to infer ranges for the preexisting vector depending on word size, and may thus retain the memmove call despite knowing we've only allocated one word. Other parts of the compiler then detect the mismatch between the constant allocation size and the much larger range that could theoretically be copied into the newly-allocated storage if we could reach the call. Ensuring the compiler is aware of the constraints on the offset range enables it to do a much better job at optimizing. Using attribute assume (_M_offset <= ...) didn't work, because gimple lowered that to something that vrp could only use to ensure 'this' was non-NULL. Exposing _M_offset as an automatic variable/gimple register outside the unevaluated assume operand enabled the optimizer to do its job. Rather than placing such load-then-assume constructs all over, I introduced an always-inline member function in bit iterators that does the job of conveying to the compiler the information that the assumption is supposed to hold, and various calls throughout functions pertaining to bit iterators that might not otherwise know that the offsets have to be in range, so that the compiler no longer needs to make conservative assumptions that prevent optimizations. With the explicit assumptions, the compiler can correlate the test for available storage in the vector with the test for how much storage might need to be copied, and determine that, if we're not asking for enough room for two or more words, we can omit entirely the code to copy two or more words, without any runtime overhead whatsoever: no traces remain of the undefined behavior or of the tests that inform the compiler about the assumptions that must hold. for libstdc++-v3/ChangeLog PR libstdc++/110807 * include/bits/stl_bvector.h (_Bit_iterator_base): Add _M_assume_normalized member function. Call it in _M_bump_up, _M_bump_down, _M_incr, operator==, operator<=>, operator<, and operator-. (_Bit_iterator): Also call it in operator*. (_Bit_const_iterator): Likewise. --- libstdc++-v3/include/bits/stl_bvector.h | 37 ++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index 8d18bcaffd434..2b91af2005f2d 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -56,6 +56,10 @@ #ifndef _STL_BVECTOR_H #define _STL_BVECTOR_H 1 +#ifndef _GLIBCXX_ALWAYS_INLINE +#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__)) +#endif + #if __cplusplus >= 201103L #include #include @@ -177,6 +181,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _Bit_type * _M_p; unsigned int _M_offset; + _GLIBCXX20_CONSTEXPR _GLIBCXX_ALWAYS_INLINE + void + _M_assume_normalized() const + { + unsigned int __ofst = _M_offset; + __attribute__ ((__assume__ (__ofst < unsigned(_S_word_bit)))); + } + _GLIBCXX20_CONSTEXPR _Bit_iterator_base(_Bit_type * __x, unsigned int __y) : _M_p(__x), _M_offset(__y) { } @@ -185,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void _M_bump_up() { + _M_assume_normalized(); if (_M_offset++ == int(_S_word_bit) - 1) { _M_offset = 0; @@ -196,6 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void _M_bump_down() { + _M_assume_normalized(); if (_M_offset-- == 0) { _M_offset = int(_S_word_bit) - 1; @@ -207,6 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER void _M_incr(ptrdiff_t __i) { + _M_assume_normalized(); difference_type __n = __i + _M_offset; _M_p += __n / int(_S_word_bit); __n = __n % int(_S_word_bit); @@ -221,7 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX_NODISCARD friend _GLIBCXX20_CONSTEXPR bool operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) - { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; } + { + __x._M_assume_normalized(); + __y._M_assume_normalized(); + return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; + } #if __cpp_lib_three_way_comparison [[nodiscard]] @@ -229,6 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) noexcept { + __x._M_assume_normalized(); + __y._M_assume_normalized(); if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0) return __cmp; return __x._M_offset <=> __y._M_offset; @@ -238,6 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER friend bool operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) { + __x._M_assume_normalized(); + __y._M_assume_normalized(); return __x._M_p < __y._M_p || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset); } @@ -266,6 +289,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER friend _GLIBCXX20_CONSTEXPR ptrdiff_t operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base& __y) { + __x._M_assume_normalized(); + __y._M_assume_normalized(); return (int(_S_word_bit) * (__x._M_p - __y._M_p) + __x._M_offset - __y._M_offset); } @@ -297,7 +322,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR reference operator*() const - { return reference(_M_p, 1UL << _M_offset); } + { + _M_assume_normalized(); + return reference(_M_p, 1UL << _M_offset); + } _GLIBCXX20_CONSTEXPR iterator& @@ -408,7 +436,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR const_reference operator*() const - { return _Bit_reference(_M_p, 1UL << _M_offset); } + { + _M_assume_normalized(); + return _Bit_reference(_M_p, 1UL << _M_offset); + } _GLIBCXX20_CONSTEXPR const_iterator& -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive