public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Kefu Chai <tchaikov@gmail.com>
Cc: libstdc++@gcc.gnu.org, Kefu Chai <kefu.chai@scylladb.com>
Subject: Re: [PATCH v1 1/1] libstdc++: Set _M_string_length before calling _M_dispose()
Date: Tue, 2 May 2023 15:06:32 +0100	[thread overview]
Message-ID: <CACb0b4kr4b56NsEGpHPDgkZhbgtmkKaWgbh6oHe_LhDbr09MMQ@mail.gmail.com> (raw)
In-Reply-To: <20230501070622.847749-2-tchaikov@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2957 bytes --]

On Mon, 1 May 2023 at 08:06, Kefu Chai via Libstdc++ <libstdc++@gcc.gnu.org>
wrote:

> This always sets _M_string_length in the constructor specialized for
> range of input_iterator, for the cases like istringstream.
>
> We copy from source range to the local buffer, and then reallocate
> to larger one if necessary, when disposing the old buffer. And the
> old buffer could be provisioned by the local buffer or an allocated
> buffer. _M_is_local() is used to tell if the buffer is the local one
> or not. In addition to comparing the buffer address with the local buffer,
> this function also performs the sanity check if _M_string_length is
> greater than _S_local_capacity, if the check fails
> __builtin_unreachable() is called. But we failed to set _M_string_length
> in this constructor is specialized for std::input_iterator. So,
> if UBSan is enabled when compiling the source, there are chances that
> the uninitialized data in _M_string_length is greater than
> _S_local_capacity, and the application aborts a runtime error or
> exception emitted by the UBSan.
>
> In this change, to avoid the false alarm, _M_string_length is
> updated with the length of number of bytes copied to local buffer, so
> that _M_is_local() is able to check based on the correct length.
>
> This issue only surfaces when constructing a string with a range of
> input_iterator, and the uninitialized _M_string_length is greater than
> _S_local_capacity, i.e., 15.
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/basic_string.tcc (_M_construct): Set
>         _M_string_length before calling _M_dispose().
>
> Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
> ---
>  libstdc++-v3/include/bits/basic_string.tcc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/include/bits/basic_string.tcc
> b/libstdc++-v3/include/bits/basic_string.tcc
> index 99fdbeee5ad..ec2198ee20b 100644
> --- a/libstdc++-v3/include/bits/basic_string.tcc
> +++ b/libstdc++-v3/include/bits/basic_string.tcc
> @@ -177,6 +177,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>             __p[__len++] = *__beg;
>             ++__beg;
>           }
> +       _M_length(__len);
>
>         struct _Guard
>         {
> --
> 2.40.1
>

Thanks for finding the issue, and providing the patch.

N.B. patches for libstdc++ (like all GCC patches) need to be sent to the
gcc-patches list. For libstdc++ they should also be sent to the libstdc++
list (as you did) but they still need to be CC'd to gcc-patches.

I think I'd prefer to do it this way instead, just initializing to zero at
the start of the constructor. What do you think?

Init'ing to an immediate zero should always be cheap. For the
ForwardIterator case the initialization can probably be eliminated as a
dead store. For the InputIterator case it avoids the bug (it doesn't matter
that _M_string_length isn't correct when _M_is_local() checks it, only that
it's not uninitialized to a value greater than the local capacity).

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 2458 bytes --]

commit 8b173f993683acdb5ca8048a0a229660c18f45ab
Author: Kefu Chai <kefu.chai@scylladb.com>
Date:   Mon May 1 21:24:26 2023

    libstdc++: Set _M_string_length before calling _M_dispose()
    
    This always sets _M_string_length in the constructor specialized for
    range of input_iterator, for the cases like istringstream.
    
    We copy from source range to the local buffer, and then reallocate
    to larger one if necessary, when disposing the old buffer. And the
    old buffer could be provisioned by the local buffer or an allocated
    buffer. _M_is_local() is used to tell if the buffer is the local one
    or not. In addition to comparing the buffer address with the local buffer,
    this function also performs the sanity check if _M_string_length is
    greater than _S_local_capacity, if the check fails
    __builtin_unreachable() is called. But we failed to set _M_string_length
    in this constructor is specialized for std::input_iterator. So,
    if UBSan is enabled when compiling the source, there are chances that
    the uninitialized data in _M_string_length is greater than
    _S_local_capacity, and the application aborts a runtime error or
    exception emitted by the UBSan.
    
    In this change, to avoid the false alarm, _M_string_length is
    initialized to zero before doing anything else, so that _M_is_local()
    doesn't see an uninitialized value.
    
    This issue only surfaces when constructing a string with a range of
    input_iterator, and the uninitialized _M_string_length is greater than
    _S_local_capacity, i.e., 15.
    
    libstdc++-v3/ChangeLog:
    
            * include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
            Initialize _M_string_length.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    Co-authored-by: Jonathan Wakely <jwakely@redhat.com>

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 8247ee6bdc6..b16b2898b62 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -760,7 +760,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	_GLIBCXX20_CONSTEXPR
         basic_string(_InputIterator __beg, _InputIterator __end,
 		     const _Alloc& __a = _Alloc())
-	: _M_dataplus(_M_local_data(), __a)
+	: _M_dataplus(_M_local_data(), __a), _M_string_length(0)
 	{
 #if __cplusplus >= 201103L
 	  _M_construct(__beg, __end, std::__iterator_category(__beg));

  reply	other threads:[~2023-05-02 14:06 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01  7:06 [PATCH v1 0/1] " Kefu Chai
2023-05-01  7:06 ` [PATCH v1 1/1] libstdc++: " Kefu Chai
2023-05-02 14:06   ` Jonathan Wakely [this message]
2023-05-03  2:17   ` [PATCH v2 0/1] " Kefu Chai
2023-05-03  2:17     ` [PATCH v2 1/1] libstdc++: Set _M_string_length before calling _M_dispose() [PR109703] Kefu Chai
2023-05-03 12:22     ` [PATCH v2 0/1] Set _M_string_length before calling _M_dispose() Jonathan Wakely
2023-05-03 12:30       ` kefu chai
2023-05-01 20:04 ` [PATCH v1 " Jonathan Wakely
2023-05-01 20:05   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACb0b4kr4b56NsEGpHPDgkZhbgtmkKaWgbh6oHe_LhDbr09MMQ@mail.gmail.com \
    --to=jwakely@redhat.com \
    --cc=kefu.chai@scylladb.com \
    --cc=libstdc++@gcc.gnu.org \
    --cc=tchaikov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).