From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by sourceware.org (Postfix) with ESMTPS id 3ADED3858C50; Fri, 29 Sep 2023 16:46:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3ADED3858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-5362bcc7026so3550242a12.1; Fri, 29 Sep 2023 09:46:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696006010; x=1696610810; darn=gcc.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=gpIlsyRP/3/PZ4wY/9poPm/3iRGDyMh/EWD8ZmJ8T3w=; b=c3zu8BNsunSNpJe7F/5/ugHluv7XZmuuyjRE1vv+pT2dtKEITjkGXqJQWY7XLp2rVL 0zfk/ltG4+/l/3n9bCUgI05dWuiqVjAtD6PxxGW/hVso3dFs7XHDj+fZqdlmSmEwN4sq nFAhTv1tbZ7K4ZERgPMUPa3q51ottzXxni3k9wzYHDeog7z+RO9aHAKlOExh5Qf+ZUJa bWecRyclpa5Sg2PllI4ClD55w8yIK35+1US0xiQPV/IIemsImYEG9dHhV5CmcI9FdxnZ fj636Xkv3MmGepa6g89MGUg/CoQKygtwmVShpvuRpWog8mmPObtqZp7qz4iUheutXyYW SkCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696006010; x=1696610810; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gpIlsyRP/3/PZ4wY/9poPm/3iRGDyMh/EWD8ZmJ8T3w=; b=qejLDgGzVbGi29EIVZF3wXByrRDCaYI45MRc4uSbMJYCcEyWCo2svXArsfrb4oqFbj s4+KCDfdTHcvsnntsrEUVJwO7J0hNBcLpKnK+HHYxULqU9ebDml6Q/ONB0gz/+hlUvCD rVKuod+dyGKdbWqK7h46cOWbknYb53PGpct7QWJMmSCFDy+bDBqBqMjYgXI4pNCEM3+z h6BPYn/aDS4i+5iPedFTgY9qbrd/+ph3I2cp7w5DrsOmnn3U4OGleswWSPfHGloY/O9L PjUMahk8ppmabcK1soMj94PCeOrxFm1m2Qt0agJbMjkCZC2V08ronyWZH5sOJhl/41eQ eAAA== X-Gm-Message-State: AOJu0YxZykuAAfY4EEHbEDzQ87kwT46+3+y+VWbK+gHLfa3RbUWCa5mo rh9n5IV/Q39NQ2pxQSaCjShEsUgfhVjPjZYvjlQ= X-Google-Smtp-Source: AGHT+IE16myszs7xnPR1HCnBXLYUOlSMpgs9W8dpldCpY05FM9necGuX5lEN5zf/EfnDmvJQ7hMNbXuxmoTWT7m+JHg= X-Received: by 2002:a17:906:74ca:b0:9b2:94a8:df5 with SMTP id z10-20020a17090674ca00b009b294a80df5mr4251757ejl.35.1696006009597; Fri, 29 Sep 2023 09:46:49 -0700 (PDT) MIME-Version: 1.0 References: <65160b60.170a0220.894a0.06bc@mx.google.com> <6516fb58.170a0220.cb8af.508f@mx.google.com> In-Reply-To: <6516fb58.170a0220.cb8af.508f@mx.google.com> From: Jonathan Wakely Date: Fri, 29 Sep 2023 17:46:37 +0100 Message-ID: Subject: Re: [PATCH] libstdc++: Ensure active union member is correctly set To: Nathaniel Shead Cc: Jonathan Wakely , Jason Merrill , "libstdc++" , gcc-patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 29 Sept 2023 at 17:29, Nathaniel Shead wrote: > > On Fri, Sep 29, 2023 at 04:06:33PM +0100, Jonathan Wakely wrote: > > On Fri, 29 Sept 2023 at 10:32, Jonathan Wakely wrote: > > > > Thanks for the comments, here's an updated version of the patch. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > Great, I'll get this committed today - thanks! > > > > That's done now. > > > > Thanks! > > > > > > > > > I'll note that there are some existing calls to `_M_use_local_data()` > > > > already used only for their side effects without a cast to void, e.g. > > > > > > > > /** > > > > * @brief Default constructor creates an empty string. > > > > */ > > > > _GLIBCXX20_CONSTEXPR > > > > basic_string() > > > > _GLIBCXX_NOEXCEPT_IF(is_nothrow_default_constructible<_Alloc>::value) > > > > : _M_dataplus(_M_local_data()) > > > > { > > > > _M_use_local_data(); > > > > _M_set_length(0); > > > > } > > > > > > > > I haven't updated these, but should this be changed for consistency? > > > > > > Yes, good idea. I can do that. > > > > I started to do that, and decided it made more sense to split out the > > constexpr loop from _M_use_local_data() into a separate function, > > _M_init_local_buf(). Then we can use that for all the places where we > > don't care about the return value. That avoids the overhead of using > > pointer_traits::pointer_to when we don't need the return value (which > > is admittedly only going to be an issue for -O0 code, but I think it's > > cleaner this way anyway). > > > > Please see the attached patch and let me know what you think. > > I agree, and it also looks clearer to me what is happening. Good, I'll make this change next week then. > > > > > > Thanks again for fixing these. I think this might fix some bug reports > > > about clang rejecting our std::string in constant expressions, so I'll > > > check those. > > > > Your patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110900 > > (so we should backport it to gcc-13 and gcc-12 too). > > > commit 2668979d3206ff6c039ac0165aae29377a15666c > > Author: Jonathan Wakely > > Date: Fri Sep 29 12:12:22 2023 > > > > libstdc++: Split std::basic_string::_M_use_local_data into two functions > > > > This splits out the activate-the-union-member-for-constexpr logic from > > _M_use_local_data, so that it can be used separately in cases that don't > > need to use std::pointer_traits::pointer_to to obtain the > > return value. > > > > This leaves only three uses of _M_use_local_data() which are all the > > same form: > > > > __s._M_data(_M_use_local_data()); > > __s._M_set_length(0); > > > > We could remove _M_use_local_data() and change those three places to use > > a new _M_reset() function that does: > > > > _M_init_local_buf(); > > _M_data(_M_local_data()); > > _M_set_length(0); > > > > This is left for a future change. > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/basic_string.h (_M_init_local_buf()): New > > function. > > (_M_use_local_data()): Use _M_init_local_buf. > > (basic_string(), basic_string(const Alloc&)) > > (basic_string(basic_string&&)) > > (basic_string(basic_string&&, const Alloc&)): Use > > _M_init_local_buf instead of _M_use_local_data(). > > * include/bits/basic_string.tcc (swap(basic_string&)) > > (_M_construct(InIter, InIter, forward_iterator_tag)) > > (_M_construct(size_type, CharT), reserve()): Likewise. > > (_M_construct(InIter, InIter, input_iterator_tag)): Remove call > > to _M_use_local_data() and initialize the local buffer directly. > > > > diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h > > index 4f94cd967cf..18a19b8dcbc 100644 > > --- a/libstdc++-v3/include/bits/basic_string.h > > +++ b/libstdc++-v3/include/bits/basic_string.h > > @@ -353,13 +353,23 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > // Ensure that _M_local_buf is the active member of the union. > > __attribute__((__always_inline__)) > > _GLIBCXX14_CONSTEXPR > > - pointer > > - _M_use_local_data() _GLIBCXX_NOEXCEPT > > + void > > + _M_init_local_buf() _GLIBCXX_NOEXCEPT > > { > > #if __cpp_lib_is_constant_evaluated > > if (std::is_constant_evaluated()) > > for (size_type __i = 0; __i <= _S_local_capacity; ++__i) > > _M_local_buf[__i] = _CharT(); > > +#endif > > + } > > + > > + __attribute__((__always_inline__)) > > + _GLIBCXX14_CONSTEXPR > > + pointer > > + _M_use_local_data() _GLIBCXX_NOEXCEPT > > + { > > +#if __glibcxx_is_constant_evaluated > > + _M_init_local_buf(); > > #endif > > return _M_local_data(); > > } > > What's the difference between __cpp_lib_is_constant_evaluated and > __glibcxx_is_constant_evaluated? Should these lines be using the same > macro here? Ah, yeah, they could be the same. The difference is that (for most feature test macros) the __glibcxx_ftm form is defined after including , but the __cpp_lib_ftm form is only defined only by headers that do: #define __glibcxx_want_ftm #include This means we can ensure that the __cpp_lib_ftm form is only defined by headers where we actually want to define it, e.g. in the headers that [version.syn] in the standard says should define the macro. So for our own internal uses, we should generally rely on the __glibcxx_ftm one. Users should rely on __cpp_lib_ftm after including the correct header. For example, __glibcxx_atomic_wait is defined after including and so is available for our own uses, but __cpp_lib_atomic_wait is only defined after including , not just . Or at least, that's the plan - I have a pending patch to make everything I just said true :-) Currently we "leak" the __cpp_lib_ftm forms into lots of internal headers. That will change next week. In this specific case, it doesn't matter, because __cpp_lib_is_constant_evaluated is defined by , and every header includes that. So it doesn't really matter whether our internal uses are __cpp_lib_is_constant_evaluated or __glibcxx_is_constant_evaluated. But it would be good to be consistent. > > > @@ -522,7 +532,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > _GLIBCXX_NOEXCEPT_IF(is_nothrow_default_constructible<_Alloc>::value) > > : _M_dataplus(_M_local_data()) > > { > > - _M_use_local_data(); > > + _M_init_local_buf(); > > _M_set_length(0); > > } > > > > @@ -534,7 +544,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > basic_string(const _Alloc& __a) _GLIBCXX_NOEXCEPT > > : _M_dataplus(_M_local_data(), __a) > > { > > - _M_use_local_data(); > > + _M_init_local_buf(); > > _M_set_length(0); > > } > > > > @@ -678,7 +688,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > { > > if (__str._M_is_local()) > > { > > - (void)_M_use_local_data(); > > + _M_init_local_buf(); > > traits_type::copy(_M_local_buf, __str._M_local_buf, > > __str.length() + 1); > > } > > @@ -718,7 +728,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > { > > if (__str._M_is_local()) > > { > > - (void)_M_use_local_data(); > > + _M_init_local_buf(); > > traits_type::copy(_M_local_buf, __str._M_local_buf, > > __str.length() + 1); > > _M_length(__str.length()); > > diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc > > index 4bc98f2aea7..052eeb9e846 100644 > > --- a/libstdc++-v3/include/bits/basic_string.tcc > > +++ b/libstdc++-v3/include/bits/basic_string.tcc > > @@ -79,7 +79,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > } > > else if (__s.length()) > > { > > - (void)_M_use_local_data(); > > + _M_init_local_buf(); > > traits_type::copy(_M_local_buf, __s._M_local_buf, > > __s.length() + 1); > > _M_length(__s.length()); > > @@ -88,7 +88,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > } > > else if (length()) > > { > > - (void)__s._M_use_local_data(); > > + __s._M_init_local_buf(); > > traits_type::copy(__s._M_local_buf, _M_local_buf, > > length() + 1); > > __s._M_length(length()); > > @@ -99,7 +99,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > else > > { > > const size_type __tmp_capacity = __s._M_allocated_capacity; > > - (void)__s._M_use_local_data(); > > + __s._M_init_local_buf(); > > traits_type::copy(__s._M_local_buf, _M_local_buf, > > length() + 1); > > _M_data(__s._M_data()); > > @@ -111,7 +111,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > const size_type __tmp_capacity = _M_allocated_capacity; > > if (__s._M_is_local()) > > { > > - (void)_M_use_local_data(); > > + _M_init_local_buf(); > > traits_type::copy(_M_local_buf, __s._M_local_buf, > > __s.length() + 1); > > __s._M_data(_M_data()); > > @@ -174,14 +174,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > size_type __len = 0; > > size_type __capacity = size_type(_S_local_capacity); > > > > - pointer __p = _M_use_local_data(); > > - > > while (__beg != __end && __len < __capacity) > > { > > - __p[__len++] = *__beg; > > + _M_local_buf[__len++] = *__beg; > > ++__beg; > > } > > > > +#if __glibcxx_is_constant_evaluated > > + if (std::is_constant_evaluated()) > > + for (size_type __i = __len; __i <= __capacity; ++__i) > > + _M_local_buf[__i] = _CharT(); > > +#endif > > + > > I wonder if maybe this should still be a call to `_M_init_local_buf()` > above, where the `_M_use_local_data()` used to be? That way the logic > stays in one place, and I don't imagine the compile time savings of not > immediately overwriting the first __len characters would be significant. Yeah, I went back and forth on that, but you're probably right. I'll do as you suggested.