* [PATCH v1 0/1] Set _M_string_length before calling _M_dispose() @ 2023-05-01 7:06 Kefu Chai 2023-05-01 7:06 ` [PATCH v1 1/1] libstdc++: " Kefu Chai 2023-05-01 20:04 ` [PATCH v1 " Jonathan Wakely 0 siblings, 2 replies; 9+ messages in thread From: Kefu Chai @ 2023-05-01 7:06 UTC (permalink / raw) To: libstdc++; +Cc: Kefu Chai This patch fixes the false alarm when performing the check introduced by bf78b43873b0b7e8f9a430df38749b8b61f9c9b8 . A minimal reproducer can be found at https://godbolt.org/z/7q4nG68xn I am pasting the reproducer here just in case: #include <cstring> #include <string> #include <sstream> int main() { unsigned char buf[sizeof(std::string)] ; std::memset(buf, 0xff, sizeof(buf)); const char s[] = "1234567890abcdefg"; std::istringstream in{s}; std::istreambuf_iterator<char> it{in}, end; auto* p = new (buf) std::string(it, end); return 0; } Tested on x86_64-pc-linux-gnu, with "make check-target-libstdc++-v3". Would be great if this change can be included in thunk, 13.2 and 12 branches. Kefu Chai (1): libstdc++: Set _M_string_length before calling _M_dispose() libstdc++-v3/include/bits/basic_string.tcc | 1 + 1 file changed, 1 insertion(+) -- 2.40.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v1 1/1] libstdc++: Set _M_string_length before calling _M_dispose() 2023-05-01 7:06 [PATCH v1 0/1] Set _M_string_length before calling _M_dispose() Kefu Chai @ 2023-05-01 7:06 ` Kefu Chai 2023-05-02 14:06 ` Jonathan Wakely 2023-05-03 2:17 ` [PATCH v2 0/1] " Kefu Chai 2023-05-01 20:04 ` [PATCH v1 " Jonathan Wakely 1 sibling, 2 replies; 9+ messages in thread From: Kefu Chai @ 2023-05-01 7:06 UTC (permalink / raw) To: libstdc++; +Cc: Kefu Chai, Kefu Chai This always sets _M_string_length in the constructor specialized for range of input_iterator, for the cases like istringstream. We copy from source range to the local buffer, and then reallocate to larger one if necessary, when disposing the old buffer. And the old buffer could be provisioned by the local buffer or an allocated buffer. _M_is_local() is used to tell if the buffer is the local one or not. In addition to comparing the buffer address with the local buffer, this function also performs the sanity check if _M_string_length is greater than _S_local_capacity, if the check fails __builtin_unreachable() is called. But we failed to set _M_string_length in this constructor is specialized for std::input_iterator. So, if UBSan is enabled when compiling the source, there are chances that the uninitialized data in _M_string_length is greater than _S_local_capacity, and the application aborts a runtime error or exception emitted by the UBSan. In this change, to avoid the false alarm, _M_string_length is updated with the length of number of bytes copied to local buffer, so that _M_is_local() is able to check based on the correct length. This issue only surfaces when constructing a string with a range of input_iterator, and the uninitialized _M_string_length is greater than _S_local_capacity, i.e., 15. libstdc++-v3/ChangeLog: * include/bits/basic_string.tcc (_M_construct): Set _M_string_length before calling _M_dispose(). Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> --- libstdc++-v3/include/bits/basic_string.tcc | 1 + 1 file changed, 1 insertion(+) diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc index 99fdbeee5ad..ec2198ee20b 100644 --- a/libstdc++-v3/include/bits/basic_string.tcc +++ b/libstdc++-v3/include/bits/basic_string.tcc @@ -177,6 +177,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __p[__len++] = *__beg; ++__beg; } + _M_length(__len); struct _Guard { -- 2.40.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 1/1] libstdc++: Set _M_string_length before calling _M_dispose() 2023-05-01 7:06 ` [PATCH v1 1/1] libstdc++: " Kefu Chai @ 2023-05-02 14:06 ` Jonathan Wakely 2023-05-03 2:17 ` [PATCH v2 0/1] " Kefu Chai 1 sibling, 0 replies; 9+ messages in thread From: Jonathan Wakely @ 2023-05-02 14:06 UTC (permalink / raw) To: Kefu Chai; +Cc: libstdc++, Kefu Chai [-- Attachment #1.1: Type: text/plain, Size: 2957 bytes --] On Mon, 1 May 2023 at 08:06, Kefu Chai via Libstdc++ <libstdc++@gcc.gnu.org> wrote: > This always sets _M_string_length in the constructor specialized for > range of input_iterator, for the cases like istringstream. > > We copy from source range to the local buffer, and then reallocate > to larger one if necessary, when disposing the old buffer. And the > old buffer could be provisioned by the local buffer or an allocated > buffer. _M_is_local() is used to tell if the buffer is the local one > or not. In addition to comparing the buffer address with the local buffer, > this function also performs the sanity check if _M_string_length is > greater than _S_local_capacity, if the check fails > __builtin_unreachable() is called. But we failed to set _M_string_length > in this constructor is specialized for std::input_iterator. So, > if UBSan is enabled when compiling the source, there are chances that > the uninitialized data in _M_string_length is greater than > _S_local_capacity, and the application aborts a runtime error or > exception emitted by the UBSan. > > In this change, to avoid the false alarm, _M_string_length is > updated with the length of number of bytes copied to local buffer, so > that _M_is_local() is able to check based on the correct length. > > This issue only surfaces when constructing a string with a range of > input_iterator, and the uninitialized _M_string_length is greater than > _S_local_capacity, i.e., 15. > > libstdc++-v3/ChangeLog: > > * include/bits/basic_string.tcc (_M_construct): Set > _M_string_length before calling _M_dispose(). > > Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> > --- > libstdc++-v3/include/bits/basic_string.tcc | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libstdc++-v3/include/bits/basic_string.tcc > b/libstdc++-v3/include/bits/basic_string.tcc > index 99fdbeee5ad..ec2198ee20b 100644 > --- a/libstdc++-v3/include/bits/basic_string.tcc > +++ b/libstdc++-v3/include/bits/basic_string.tcc > @@ -177,6 +177,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > __p[__len++] = *__beg; > ++__beg; > } > + _M_length(__len); > > struct _Guard > { > -- > 2.40.1 > Thanks for finding the issue, and providing the patch. N.B. patches for libstdc++ (like all GCC patches) need to be sent to the gcc-patches list. For libstdc++ they should also be sent to the libstdc++ list (as you did) but they still need to be CC'd to gcc-patches. I think I'd prefer to do it this way instead, just initializing to zero at the start of the constructor. What do you think? Init'ing to an immediate zero should always be cheap. For the ForwardIterator case the initialization can probably be eliminated as a dead store. For the InputIterator case it avoids the bug (it doesn't matter that _M_string_length isn't correct when _M_is_local() checks it, only that it's not uninitialized to a value greater than the local capacity). [-- Attachment #2: patch.txt --] [-- Type: text/plain, Size: 2458 bytes --] commit 8b173f993683acdb5ca8048a0a229660c18f45ab Author: Kefu Chai <kefu.chai@scylladb.com> Date: Mon May 1 21:24:26 2023 libstdc++: Set _M_string_length before calling _M_dispose() This always sets _M_string_length in the constructor specialized for range of input_iterator, for the cases like istringstream. We copy from source range to the local buffer, and then reallocate to larger one if necessary, when disposing the old buffer. And the old buffer could be provisioned by the local buffer or an allocated buffer. _M_is_local() is used to tell if the buffer is the local one or not. In addition to comparing the buffer address with the local buffer, this function also performs the sanity check if _M_string_length is greater than _S_local_capacity, if the check fails __builtin_unreachable() is called. But we failed to set _M_string_length in this constructor is specialized for std::input_iterator. So, if UBSan is enabled when compiling the source, there are chances that the uninitialized data in _M_string_length is greater than _S_local_capacity, and the application aborts a runtime error or exception emitted by the UBSan. In this change, to avoid the false alarm, _M_string_length is initialized to zero before doing anything else, so that _M_is_local() doesn't see an uninitialized value. This issue only surfaces when constructing a string with a range of input_iterator, and the uninitialized _M_string_length is greater than _S_local_capacity, i.e., 15. libstdc++-v3/ChangeLog: * include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)): Initialize _M_string_length. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Co-authored-by: Jonathan Wakely <jwakely@redhat.com> diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 8247ee6bdc6..b16b2898b62 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -760,7 +760,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _GLIBCXX20_CONSTEXPR basic_string(_InputIterator __beg, _InputIterator __end, const _Alloc& __a = _Alloc()) - : _M_dataplus(_M_local_data(), __a) + : _M_dataplus(_M_local_data(), __a), _M_string_length(0) { #if __cplusplus >= 201103L _M_construct(__beg, __end, std::__iterator_category(__beg)); ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() 2023-05-01 7:06 ` [PATCH v1 1/1] libstdc++: " Kefu Chai 2023-05-02 14:06 ` Jonathan Wakely @ 2023-05-03 2:17 ` Kefu Chai 2023-05-03 2:17 ` [PATCH v2 1/1] libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] Kefu Chai 2023-05-03 12:22 ` [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() Jonathan Wakely 1 sibling, 2 replies; 9+ messages in thread From: Kefu Chai @ 2023-05-03 2:17 UTC (permalink / raw) To: libstdc++; +Cc: gcc-patches, Kefu Chai Hi Jonathan, Thank you for your review and suggestion. The change looks great! Assigning a value with an immediate zero is indeed much faster. in v2: * revised the commit message a little bit, I found it a little bit difficult to parse when re-reading it. * associated the commit with PR/libstdc++/109703. as I just filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109706, which turns out to be a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109703 The rest of the v2 patch is identical to the one attached in your reply. Would you please taking another look? Kefu Chai (1): libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] libstdc++-v3/include/bits/basic_string.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.40.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/1] libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] 2023-05-03 2:17 ` [PATCH v2 0/1] " Kefu Chai @ 2023-05-03 2:17 ` Kefu Chai 2023-05-03 12:22 ` [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() Jonathan Wakely 1 sibling, 0 replies; 9+ messages in thread From: Kefu Chai @ 2023-05-03 2:17 UTC (permalink / raw) To: libstdc++; +Cc: gcc-patches, Kefu Chai, Kefu Chai, Jonathan Wakely This patch always sets _M_string_length in the constructor specialized for range of input_iterator, for the cases like istringstream. We copy from source range to the local buffer, and then reallocate to a larger one if necessary. When disposing the old buffer, the old buffer could be provisioned by the local buffer or an allocated buffer. _M_is_local() is used to tell if the buffer is the local one or not. In addition to comparing the buffer address with the local buffer, this function also performs the sanity checking if _M_string_length is greater than _S_local_capacity, if the check fails __builtin_unreachable() is called. But we failed to set _M_string_length in this constructor is specialized for std::input_iterator. So, if UBSan is enabled when compiling the source, there are chances that the uninitialized data in _M_string_length is greater than _S_local_capacity, and the application aborts a runtime error or exception emitted by the UBSan. In this change, to avoid the false alarm, _M_string_length is initialized to zero before doing anything else, so that _M_is_local() doesn't see an uninitialized value. This issue only surfaces when constructing a string with a range of input_iterator, and the uninitialized _M_string_length is greater than _S_local_capacity, i.e., 15. libstdc++-v3/ChangeLog: PR libstdc++/109703 * include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)): Initialize _M_string_length. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Co-authored-by: Jonathan Wakely <jwakely@redhat.com> --- libstdc++-v3/include/bits/basic_string.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 8247ee6bdc6..b16b2898b62 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -760,7 +760,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _GLIBCXX20_CONSTEXPR basic_string(_InputIterator __beg, _InputIterator __end, const _Alloc& __a = _Alloc()) - : _M_dataplus(_M_local_data(), __a) + : _M_dataplus(_M_local_data(), __a), _M_string_length(0) { #if __cplusplus >= 201103L _M_construct(__beg, __end, std::__iterator_category(__beg)); -- 2.40.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() 2023-05-03 2:17 ` [PATCH v2 0/1] " Kefu Chai 2023-05-03 2:17 ` [PATCH v2 1/1] libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] Kefu Chai @ 2023-05-03 12:22 ` Jonathan Wakely 2023-05-03 12:30 ` kefu chai 1 sibling, 1 reply; 9+ messages in thread From: Jonathan Wakely @ 2023-05-03 12:22 UTC (permalink / raw) To: Kefu Chai; +Cc: libstdc++, gcc-patches [-- Attachment #1: Type: text/plain, Size: 1300 bytes --] On Wed, 3 May 2023 at 03:17, Kefu Chai via Libstdc++ <libstdc++@gcc.gnu.org> wrote: > Hi Jonathan, > > Thank you for your review and suggestion. The change looks great! > Assigning a value with an immediate zero is indeed much faster. > > in v2: > > * revised the commit message a little bit, I found it a little bit > difficult to parse when re-reading it. > * associated the commit with PR/libstdc++/109703. as I just filed > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109706, which turns out > to be a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109703 > > The rest of the v2 patch is identical to the one attached in your reply. > > Would you please taking another look? > Thanks. I've pushed it to trunk as cbf6c7a1d16490a1e63e9a5ce00e9a5c44c4c2f2 and will backport it too. I altered the commit msg again, because "input_iterator" is a C++20 concept, and here we're just talking about types meeting the old C[[17InputIterator requirements, not types modelling the concept. I also added references to the commit and PR that added the __builtin_unreachable(). > Kefu Chai (1): > libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] > > libstdc++-v3/include/bits/basic_string.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > -- > 2.40.1 > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() 2023-05-03 12:22 ` [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() Jonathan Wakely @ 2023-05-03 12:30 ` kefu chai 0 siblings, 0 replies; 9+ messages in thread From: kefu chai @ 2023-05-03 12:30 UTC (permalink / raw) To: Jonathan Wakely; +Cc: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1594 bytes --] Le mer. 3 mai 2023 à 20:22, Jonathan Wakely <jwakely@redhat.com> a écrit : > > > On Wed, 3 May 2023 at 03:17, Kefu Chai via Libstdc++ < > libstdc++@gcc.gnu.org> wrote: > >> Hi Jonathan, >> >> Thank you for your review and suggestion. The change looks great! >> Assigning a value with an immediate zero is indeed much faster. >> >> in v2: >> >> * revised the commit message a little bit, I found it a little bit >> difficult to parse when re-reading it. >> * associated the commit with PR/libstdc++/109703. as I just filed >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109706, which turns out >> to be a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109703 >> >> The rest of the v2 patch is identical to the one attached in your reply. >> >> Would you please taking another look? >> > > Thanks. I've pushed it to trunk as > cbf6c7a1d16490a1e63e9a5ce00e9a5c44c4c2f2 and will backport it too. > > I altered the commit msg again, because "input_iterator" is a C++20 > concept, and here we're just talking about types meeting the old > C[[17InputIterator requirements, not types modelling the concept. I also > added references to the commit and PR that added the > __builtin_unreachable(). > Awesome! The commit message is much better now. Thank you very much for your help! > > >> Kefu Chai (1): >> libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] >> >> libstdc++-v3/include/bits/basic_string.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> -- >> 2.40.1 >> >> -- Regards Kefu Chai ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/1] Set _M_string_length before calling _M_dispose() 2023-05-01 7:06 [PATCH v1 0/1] Set _M_string_length before calling _M_dispose() Kefu Chai 2023-05-01 7:06 ` [PATCH v1 1/1] libstdc++: " Kefu Chai @ 2023-05-01 20:04 ` Jonathan Wakely 2023-05-01 20:05 ` Jonathan Wakely 1 sibling, 1 reply; 9+ messages in thread From: Jonathan Wakely @ 2023-05-01 20:04 UTC (permalink / raw) To: Kefu Chai; +Cc: libstdc++ [-- Attachment #1: Type: text/plain, Size: 1122 bytes --] On Mon, 1 May 2023 at 08:06, Kefu Chai via Libstdc++ <libstdc++@gcc.gnu.org> wrote: > This patch fixes the false alarm when performing the check introduced by > bf78b43873b0b7e8f9a430df38749b8b61f9c9b8 . A minimal reproducer can be > found at https://godbolt.org/z/7q4nG68xn > > I am pasting the reproducer here just in case: > > #include <cstring> > #include <string> > #include <sstream> > > int main() { > unsigned char buf[sizeof(std::string)] ; > std::memset(buf, 0xff, sizeof(buf)); > const char s[] = "1234567890abcdefg"; > std::istringstream in{s}; > std::istreambuf_iterator<char> it{in}, end; > auto* p = new (buf) std::string(it, end); > return 0; > } > > Tested on x86_64-pc-linux-gnu, with "make check-target-libstdc++-v3". > Would be great if this change can be included in thunk, 13.2 and 12 > branches. > The bf78b43873b0b7e8f9a430df38749b8b61f9c9b8 change isn't on the gcc-12 branch. > Kefu Chai (1): > libstdc++: Set _M_string_length before calling _M_dispose() > > libstdc++-v3/include/bits/basic_string.tcc | 1 + > 1 file changed, 1 insertion(+) > > -- > 2.40.1 > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/1] Set _M_string_length before calling _M_dispose() 2023-05-01 20:04 ` [PATCH v1 " Jonathan Wakely @ 2023-05-01 20:05 ` Jonathan Wakely 0 siblings, 0 replies; 9+ messages in thread From: Jonathan Wakely @ 2023-05-01 20:05 UTC (permalink / raw) To: Kefu Chai; +Cc: libstdc++ [-- Attachment #1: Type: text/plain, Size: 1337 bytes --] On Mon, 1 May 2023 at 21:04, Jonathan Wakely <jwakely@redhat.com> wrote: > > > On Mon, 1 May 2023 at 08:06, Kefu Chai via Libstdc++ < > libstdc++@gcc.gnu.org> wrote: > >> This patch fixes the false alarm when performing the check introduced by >> bf78b43873b0b7e8f9a430df38749b8b61f9c9b8 . A minimal reproducer can be >> found at https://godbolt.org/z/7q4nG68xn >> >> I am pasting the reproducer here just in case: >> >> #include <cstring> >> #include <string> >> #include <sstream> >> >> int main() { >> unsigned char buf[sizeof(std::string)] ; >> std::memset(buf, 0xff, sizeof(buf)); >> const char s[] = "1234567890abcdefg"; >> std::istringstream in{s}; >> std::istreambuf_iterator<char> it{in}, end; >> auto* p = new (buf) std::string(it, end); >> return 0; >> } >> >> Tested on x86_64-pc-linux-gnu, with "make check-target-libstdc++-v3". >> Would be great if this change can be included in thunk, 13.2 and 12 >> branches. >> > > The bf78b43873b0b7e8f9a430df38749b8b61f9c9b8 change isn't on the gcc-12 > branch. > Oh wait, yes it is - I was thinking of another change that I didn't backport, sorry. > > >> Kefu Chai (1): >> libstdc++: Set _M_string_length before calling _M_dispose() >> >> libstdc++-v3/include/bits/basic_string.tcc | 1 + >> 1 file changed, 1 insertion(+) >> >> -- >> 2.40.1 >> >> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-05-03 12:31 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-01 7:06 [PATCH v1 0/1] Set _M_string_length before calling _M_dispose() Kefu Chai 2023-05-01 7:06 ` [PATCH v1 1/1] libstdc++: " Kefu Chai 2023-05-02 14:06 ` Jonathan Wakely 2023-05-03 2:17 ` [PATCH v2 0/1] " Kefu Chai 2023-05-03 2:17 ` [PATCH v2 1/1] libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] Kefu Chai 2023-05-03 12:22 ` [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() Jonathan Wakely 2023-05-03 12:30 ` kefu chai 2023-05-01 20:04 ` [PATCH v1 " Jonathan Wakely 2023-05-01 20:05 ` Jonathan Wakely
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).