public inbox for gcc-prs@sourceware.org
help / color / mirror / Atom feed
From: jkanze@caicheuvreuse.com
To: gcc-gnats@gcc.gnu.org
Subject: libstdc++/10350: thread-safety problem in std::string.
Date: Tue, 08 Apr 2003 16:16:00 -0000	[thread overview]
Message-ID: <20030408161221.5483.qmail@sources.redhat.com> (raw)


>Number:         10350
>Category:       libstdc++
>Synopsis:       thread-safety problem in std::string.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Apr 08 16:16:01 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     James Kanze
>Release:        g++ 3.2.2
>Organization:
>Environment:

>Description:
I believe I've found an error in the thread management in std::string in
the library.  Basically, the problem occurs in the following context:

  - There is a global variable declared:
        std::string s ;
    This variable is initialised by reading from a configuration file,
    in main, before any threads have been initialized.  It is non-const
    so that this initialization can occur; it is never modified
    afterwards.  Since it is never modified once threads have been
    started, no accesses are protected by mutex's or anything else.

  - A copy of this variable is made to another string variable.  This
    causes the reference count _M_references in basic_string::_Rep to
    become 1, with the result that _Rep::_M_is_shared will return true.

  - Several threads are started.  At exactly the same moment, in thread
    A and thread B, an expression along the lines of "s[ i ] == 'x'" is
    executed.

    Note that because the instance is not const, the function called is
    the non-const version of basic_string::operator[], even though
    nobody is going to modify anything. (And because nobody is modifying
    the string in any of the threads, there are no locks.) This function
    begins by calling _M_leak; since _Rep isn't yet leaked, this
    function calls _M_leak_hard, and since the _Rep is shared, we end up
    in _M_mutate. _M_mutate, after calculating a lot of different sizes
    (which aren't really relevant here), executes an if, which evaluates
    true.  The last two lines of the if are the critical section:
    imagine that thread A has been executing up until now, and it is
    interrupted by thread B.  At this point, thread A has not modified
    anything in the actual object -- it has made a series of tests,
    decided that it must isolate the representation, and has prepared a
    new representation, but it has not started installing it.  Thread B
    executes the same code, and since the actual object has not been
    changed, comes to the same conclusions, and arrives at the same
    point.  From this point, several scenarios are possible:  thread B
    continues, calling dispose on the _Rep in the actual object (which,
    since there is still another object using it, will not be
    destructed), installs its own newly created _Rep and returns a
    reference to the data in it.  At this point, thread A takes over
    again, calls _M_rep()->_M_dispose, which disposes of the _Rep
    thread B has just installed -- since the just installed _Rep is not
    shared it is deleted, and the reference in B is invalid.

For the moment, I'm not convinced that a copy on write implementation of
the std::string interface is possible with only the primitives
atomic_incr, atomic_decr, barrier_read and barrier_write -- the current
g++ implementation tries to do it with even less.  (Note that the
current implementation fails lamentably in a multi-processor
environment; there is no effort what so ever made to ensure that writes
are actually seen by the other processes.  But I've not much experience
in evaluating this sort of problem.)

Might I suggest that the current implementation be completely scrapped,
in favor of a simple deep copy.  The results will be somewhat slower,
but at least they will work.  Alternatively, if you have some mechanism
of switching implementations depending on the -mt switch of the
compiler...  (I've found COW to be a definite win in my single threaded
applications.)
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:


             reply	other threads:[~2003-04-08 16:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-08 16:16 jkanze [this message]
2003-05-02 21:36 Fred Channey
2003-05-02 21:46 Fred Channey
2003-05-05 11:46 John Love-Jensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030408161221.5483.qmail@sources.redhat.com \
    --to=jkanze@caicheuvreuse.com \
    --cc=gcc-gnats@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).