From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=038H=GW=gmail.com=jwakely.gcc@sourceware.org>
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f])
	by sourceware.org (Postfix) with ESMTPS id CBD3538582BC;
	Thu,  9 Nov 2023 01:22:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBD3538582BC
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CBD3538582BC
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52f
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699492951; cv=none;
	b=AmW7dMVLdk8caZalkJ7gr2OwbsJMRhmKbxIre1rD5dSdFZjrfCgClmOtYKhQdjy2YulrcYpPsB5BKPVpnAAwWjQc+wiWl+xuTDBUVUAFAYnHZjQcWZGPKqEGANombXI59YpFDlWHZirvrIyK9qm+Z+S6oe4oon66AUzVMTCIPbQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1699492951; c=relaxed/simple;
	bh=lKEYpOSZyJcPNF46Oc3sBfRoupQVBeziPukd38OjkiU=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=uw+GnAvn/zqRdwnOquy2VF3KQxcnrgjDdQOe7I2WOirOiHB4Ur3m9GiqhdCoR6m5EamkZKO6/EbKx/4lWH1aKTLc7QaUoosRJolVz62803Tjxyh6/xDH3MjRcmFUfpPWIMLgNOSzqPf33hQZh05oSUAs7dh+VQn45ndWwFaZSA8=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-53de0d1dc46so413339a12.3;
        Wed, 08 Nov 2023 17:22:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1699492946; x=1700097746; darn=gcc.gnu.org;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=z71neY7UknQHYqnfwnM3+wYtfmswzxfyTSUclrtxhQs=;
        b=lPmyd4o7pGeRTqUm9lfYwdLu9qxrUtZGpp1fq5/ejC8N3XHoBb5rlrukFIeHULREXg
         5U0K6hAwaNNuB9oNuEYRRB69r8fuWBfTL0rqjpjsFG3+98wejhVLj1WK+PwsRxr6rOgO
         X20GpEXE2UNu6Lbm1sW+YDGXuKhd27gNFhlFxVQxArPRJ/CYw0BhXIuOlJsbUqqJHm07
         DRFq0oO8XJQ5EXBfJExBRZ4712tk0zmP629vn2ofWF6DV799Ls4zHSNBSCVYcYusCunk
         RBA5gqAWwJ4MTtMIGFRJDlBsggQJz7mba/0ipiV2prk3sg/o95QVe+IsadIYrjXwYjzm
         2Z4Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699492946; x=1700097746;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=z71neY7UknQHYqnfwnM3+wYtfmswzxfyTSUclrtxhQs=;
        b=F+e7BkHhSL9fsvybV9RWUdc48jVwyM/91FIh0+coraGHgbIs0mpA7sAF03ASOfZ24j
         NhLP+b/ZRCw3XQ0wxJ6QM8jXsbwiku0NaEzsrNxAXXevJ7z88nLSlGIVdTZIM3QpTU5W
         FNHezo41zI6JZdJgetKajYkqfEvYk7W4PWumhO3QoFSSj5rFGStQDtgnKLeGwpUoaiiQ
         Oe5ZUkWyAR7NSaOLfT8B/MBDkQdgbFefv2v7VzIrkAXtFBOh75h1hV/wDiWkrnkzSn3u
         1uzdUS9yca4fYIvWb0WC11Ji4z1DWNZw0dlkLAELAVoQ6Vzxh4GHjNMvIFuxpFUrgaKP
         72WQ==
X-Gm-Message-State: AOJu0YwAUAu4QvSMaiEH1UCDkzWqMm/ioEJU0pGa0kqbSl5Fgw+0suYN
	sVD8WwFFRiASbGUIG992NXgKdHgmteX+mF9wcw8=
X-Google-Smtp-Source: AGHT+IG37Yuzkvi7eUumvtH4Ie9hW1enoLzyWgW11EN7zT0bovu0xSYKgPfVxHQwlgCcBkbmInBFsA0E8xciCnk9FJM=
X-Received: by 2002:a17:907:3f0e:b0:9a4:88af:b7b with SMTP id
 hq14-20020a1709073f0e00b009a488af0b7bmr2984223ejc.62.1699492946220; Wed, 08
 Nov 2023 17:22:26 -0800 (PST)
MIME-Version: 1.0
References: <orfs1gi5ud.fsf@lxoliva.fsfla.org> <ZUviWUHetjSC7/rD@redhat.com> <ormsvnhgj2.fsf_-_@lxoliva.fsfla.org>
In-Reply-To: <ormsvnhgj2.fsf_-_@lxoliva.fsfla.org>
From: Jonathan Wakely <jwakely.gcc@gmail.com>
Date: Thu, 9 Nov 2023 01:22:13 +0000
Message-ID: <CAH6eHdSknFCEqo+EtuLQFkw_CeO+dOd3GM=vKO4JgAN2ryKn7g@mail.gmail.com>
Subject: Re: [PATCH v2] libstdc++: optimize bit iterators assuming
 normalization [PR110807]
To: Alexandre Oliva <oliva@adacore.com>
Cc: Jonathan Wakely <jwakely@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>, 
	"libstdc++" <libstdc++@gcc.gnu.org>
Content-Type: multipart/alternative; boundary="0000000000005b9abe0609ae0934"
X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX,WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

--0000000000005b9abe0609ae0934
Content-Type: text/plain; charset="UTF-8"

On Thu, 9 Nov 2023, 01:17 Alexandre Oliva, <oliva@adacore.com> wrote:

> On Nov  8, 2023, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> > A single underscore prefix on __GLIBCXX_BUILTIN_ASSUME and
> > __GLIBCXX_DISABLE_ASSUMPTIONS please.
>
> That's entirely gone now.
>
> >> +    do                                              \
> >> +      if (std::is_constant_evaluated ())    \
> >> +    static_assert(expr);                    \
>
> > This can never be valid.
>
> *nod*
>
> > This already works fine in constant evaluation anyway.
>
> Yeah, that's what I figured.
>
> > But what's the null dereference for?
>
> The idea was to clearly trigger undefined behavior.  Maybe it wasn't
> needed, it didn't occur to me that __builtin_unreachable() would be
> enough.  I realize I was really trying to emulate attribute assume, even
> without knowing it existed ;-)
>
> >> +#define __GLIBCXX_BUILTIN_ASSUME(expr)              \
> >> +    (void)(false && (expr))
>
> > What's the point of this, just to verify that (expr) is contextually
> > convertible to bool?
>
> I'd have phrased it as "avoid the case in which something compiles with
> -O0 but not with -O", but yeah ;-)
>
> > We don't use the _p suffix for predicates in the library.
> > Please use just _M_normalized or _M_is_normalized.
>
> ACK.  It's also gone now.
>
> > But do we even need this function? It's not used anywhere else, can we
> > just inline the condition into _M_assume_normalized() ?
>
> I had other uses for it in earlier versions of the patch, but it makes
> no sense any more indeed.
>
> >> +    _GLIBCXX20_CONSTEXPR
> >> +    void
> >> +    _M_assume_normalized() const
>
> > I think this should use _GLIBCXX_ALWAYS_INLINE
>
> *nod*, thanks
>
> >> +    {
> >> +      __GLIBCXX_BUILTIN_ASSUME (_M_normalized_p ());
>
> > Is there even any benefit to this macro?
>
> I just thought it could have other uses, without being aware that the
> entire concept was available as a statement attribute.  Funny, I'd even
> searched for it among the existing attributes and builtins, but somehow
> I managed to miss it.  Thanks for getting me back down that path.
>
> >        __attribute__((__assume__(_M_offset < unsigned(_S_word_bit))));
>
> That unfortunately doesn't work, because the assume lowering doesn't go
> as far as dereferencing the implicit this and making an SSA_NAME out of
> the loaded _M_offset, which we'd need to be able to optimize based on
> it.  But that only took me a while to figure out and massage into
> something that had the desired effect.  Now, maybe the above *should*
> have that effect already, but unfortunately it doesn't.
>
> > Maybe even get rid of _M_assume_normalized() as a function and just
> > put that attribute everywhere you currently use _M_assume_normalized.
>
> Because of the slight kludge required to make the attribute have the
> desired effect (namely ensuring the _M_offset reference is evaluated),
> I've retained it as an inline function.
>
> Here's what I'm retesting now.  WDYT?
>

ofst needs to be __ofst but OK for trunk with that change.

We probably want this on the gcc-13 branch too, but let's give it some time
on trunk in case the assume attribute isn't quite ready for prime time.


>
> libstdc++: optimize bit iterators assuming normalization [PR110807]
>
> The representation of bit iterators, using a pointer into an array of
> words, and an unsigned bit offset into that word, makes for some
> optimization challenges: because the compiler doesn't know that the
> offset is always in a certain narrow range, beginning at zero and
> ending before the word bitwidth, when a function loads an offset that
> it hasn't normalized itself, it may fail to derive certain reasonable
> conclusions, even to the point of retaining useless calls that elicit
> incorrect warnings.
>
> Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
> list to a global bit vector variable.  Based on the compile-time
> constant length of the list, we decide in _M_insert_range whether to
> use the existing storage or to allocate new storage for the vector.
> After allocation, we decide in _M_copy_aligned how to copy any
> preexisting portions of the vector to the newly-allocated storage.
> When copying two or more words, we use __builtin_memmove.
>
> However, because we compute the available room using bit offsets
> without range information, even comparing them with constants, we fail
> to infer ranges for the preexisting vector depending on word size, and
> may thus retain the memmove call despite knowing we've only allocated
> one word.
>
> Other parts of the compiler then detect the mismatch between the
> constant allocation size and the much larger range that could
> theoretically be copied into the newly-allocated storage if we could
> reach the call.
>
> Ensuring the compiler is aware of the constraints on the offset range
> enables it to do a much better job at optimizing.  Using attribute
> assume (_M_offset <= ...) didn't work, because gimple lowered that to
> something that vrp could only use to ensure 'this' was non-NULL.
> Exposing _M_offset as an automatic variable/gimple register outside
> the unevaluated assume operand enabled the optimizer to do its job.
>
> Rather than placing such load-then-assume constructs all over, I
> introduced an always-inline member function in bit iterators that does
> the job of conveying to the compiler the information that the
> assumption is supposed to hold, and various calls throughout functions
> pertaining to bit iterators that might not otherwise know that the
> offsets have to be in range, so that the compiler no longer needs to
> make conservative assumptions that prevent optimizations.
>
> With the explicit assumptions, the compiler can correlate the test for
> available storage in the vector with the test for how much storage
> might need to be copied, and determine that, if we're not asking for
> enough room for two or more words, we can omit entirely the code to
> copy two or more words, without any runtime overhead whatsoever: no
> traces remain of the undefined behavior or of the tests that inform
> the compiler about the assumptions that must hold.
>
>
> for  libstdc++-v3/ChangeLog
>
>         PR libstdc++/110807
>         * include/bits/stl_bvector.h (_Bit_iterator_base): Add
>         _M_assume_normalized member function.  Call it in _M_bump_up,
>         _M_bump_down, _M_incr, operator==, operator<=>, operator<, and
>         operator-.
>         (_Bit_iterator): Also call it in operator*.
>         (_Bit_const_iterator): Likewise.
> ---
>  libstdc++-v3/include/bits/stl_bvector.h |   37
> ++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_bvector.h
> b/libstdc++-v3/include/bits/stl_bvector.h
> index 8d18bcaffd434..bb9f38cdf2d59 100644
> --- a/libstdc++-v3/include/bits/stl_bvector.h
> +++ b/libstdc++-v3/include/bits/stl_bvector.h
> @@ -56,6 +56,10 @@
>  #ifndef _STL_BVECTOR_H
>  #define _STL_BVECTOR_H 1
>
> +#ifndef _GLIBCXX_ALWAYS_INLINE
> +#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__))
> +#endif
> +
>  #if __cplusplus >= 201103L
>  #include <initializer_list>
>  #include <bits/functional_hash.h>
> @@ -177,6 +181,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      _Bit_type * _M_p;
>      unsigned int _M_offset;
>
> +    _GLIBCXX20_CONSTEXPR _GLIBCXX_ALWAYS_INLINE
> +    void
> +    _M_assume_normalized() const
> +    {
> +      unsigned int ofst = _M_offset;
> +      __attribute__ ((__assume__ (ofst < unsigned(_S_word_bit))));
> +    }
> +
>      _GLIBCXX20_CONSTEXPR
>      _Bit_iterator_base(_Bit_type * __x, unsigned int __y)
>      : _M_p(__x), _M_offset(__y) { }
> @@ -185,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      void
>      _M_bump_up()
>      {
> +      _M_assume_normalized();
>        if (_M_offset++ == int(_S_word_bit) - 1)
>         {
>           _M_offset = 0;
> @@ -196,6 +209,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      void
>      _M_bump_down()
>      {
> +      _M_assume_normalized();
>        if (_M_offset-- == 0)
>         {
>           _M_offset = int(_S_word_bit) - 1;
> @@ -207,6 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      void
>      _M_incr(ptrdiff_t __i)
>      {
> +      _M_assume_normalized();
>        difference_type __n = __i + _M_offset;
>        _M_p += __n / int(_S_word_bit);
>        __n = __n % int(_S_word_bit);
> @@ -221,7 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      _GLIBCXX_NODISCARD
>      friend _GLIBCXX20_CONSTEXPR bool
>      operator==(const _Bit_iterator_base& __x, const _Bit_iterator_base&
> __y)
> -    { return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset; }
> +    {
> +      __x._M_assume_normalized();
> +      __y._M_assume_normalized();
> +      return __x._M_p == __y._M_p && __x._M_offset == __y._M_offset;
> +    }
>
>  #if __cpp_lib_three_way_comparison
>      [[nodiscard]]
> @@ -229,6 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      operator<=>(const _Bit_iterator_base& __x, const _Bit_iterator_base&
> __y)
>      noexcept
>      {
> +      __x._M_assume_normalized();
> +      __y._M_assume_normalized();
>        if (const auto __cmp = __x._M_p <=> __y._M_p; __cmp != 0)
>         return __cmp;
>        return __x._M_offset <=> __y._M_offset;
> @@ -238,6 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      friend bool
>      operator<(const _Bit_iterator_base& __x, const _Bit_iterator_base&
> __y)
>      {
> +      __x._M_assume_normalized();
> +      __y._M_assume_normalized();
>        return __x._M_p < __y._M_p
>             || (__x._M_p == __y._M_p && __x._M_offset < __y._M_offset);
>      }
> @@ -266,6 +289,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      friend _GLIBCXX20_CONSTEXPR ptrdiff_t
>      operator-(const _Bit_iterator_base& __x, const _Bit_iterator_base&
> __y)
>      {
> +      __x._M_assume_normalized();
> +      __y._M_assume_normalized();
>        return (int(_S_word_bit) * (__x._M_p - __y._M_p)
>               + __x._M_offset - __y._M_offset);
>      }
> @@ -297,7 +322,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>      reference
>      operator*() const
> -    { return reference(_M_p, 1UL << _M_offset); }
> +    {
> +      _M_assume_normalized();
> +      return reference(_M_p, 1UL << _M_offset);
> +    }
>
>      _GLIBCXX20_CONSTEXPR
>      iterator&
> @@ -408,7 +436,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>      _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>      const_reference
>      operator*() const
> -    { return _Bit_reference(_M_p, 1UL << _M_offset); }
> +    {
> +      _M_assume_normalized();
> +      return _Bit_reference(_M_p, 1UL << _M_offset);
> +    }
>
>      _GLIBCXX20_CONSTEXPR
>      const_iterator&
>
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

--0000000000005b9abe0609ae0934--