From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 37902 invoked by alias); 1 Sep 2015 14:27:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 37882 invoked by uid 89); 1 Sep 2015 14:27:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.1 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS,T_RP_MATCHES_RCVD,URIBL_BLACK autolearn=no version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 01 Sep 2015 14:27:16 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 31E9CC1CC1E2; Tue, 1 Sep 2015 14:27:15 +0000 (UTC) Received: from localhost (ovpn-116-43.ams2.redhat.com [10.36.116.43]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t81EREX0028811; Tue, 1 Sep 2015 10:27:14 -0400 Date: Tue, 01 Sep 2015 14:27:00 -0000 From: Jonathan Wakely To: Dmitry Vyukov Cc: GCC Patches , libstdc++@gcc.gnu.org, Alexander Potapenko , Kostya Serebryany , Torvald Riegel Subject: Re: [Patch, libstdc++] Fix data races in basic_string implementation Message-ID: <20150901142713.GG2631@redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mutt/1.5.23 (2014-03-12) X-SW-Source: 2015-09/txt/msg00061.txt.bz2 On 01/09/15 14:51 +0200, Dmitry Vyukov wrote: >Hello, > >The refcounted basic_string implementation contains several data races >on _M_refcount: >1. _M_is_leaked loads _M_refcount concurrently with mutations of >_M_refcount. This loads needs to be memory_order_relaxed load, as >_M_is_leaked predicate does not change under the feet. >2. _M_is_shared loads _M_refcount concurrently with mutations of >_M_refcount. This loads needs to be memory_order_acquire, as another >thread can drop _M_refcount to zero concurrently which makes the >string non-shared and so the current thread can mutate the string. We >need reads of the string in another thread (while it was shared) to >happen-before the writes to the string in this thread (now that it is >non-shared). > >This patch adds __gnu_cxx::__atomic_load_dispatch function to do the >loads of _M_refcount. The function does an acquire load. Acquire is >non needed for _M_is_leaked, but for simplicity as still do acquire >(hopefully the refcounted impl will go away in future). It's unlikely to go away for a long time. >This patch also uses the new function to do loads of _M_refcount in >string implementation. > >I did non update doc/xml/manual/concurrency_extensions.xml to document >__gnu_cxx::__atomic_load_dispatch, because I am not sure we want to >extend that api now that we have language-level atomics. If you still >want me to update it, please say how to regenerate >doc/html/manual/ext_concurrency.html. I don't know if we need the new __atomic_load_dispatch function at all, we could just use atomic loads directly e.g. bool _M_is_leaked() const _GLIBCXX_NOEXCEPT #if defined(__GTHREADS) && defined(_GLIBCXX_ATOMIC_BUILTINS) + { return __atomic_load_n(&this->_M_refcount, __ATOMIC_RELAXED) < 0; } +#else { return this->_M_refcount > 0; } +#endif bool _M_is_shared() const _GLIBCXX_NOEXCEPT #if defined(__GTHREADS) && defined(_GLIBCXX_ATOMIC_BUILTINS) + { return __atomic_load(&this->_M_refcount, __ATOMIC_ACQUIRE) > 0; } +#else { return this->_M_refcount > 0; } +#endif The __atomic_xxx_dispatch functions check __gthread_active_p() as an optimisation to avoid potentially expensive atomic stores, but I don't think we need that for atomic loads here, especially the relaxed load. We could always add the dispatch in later if we get complaints about single-threaded performance on targets where the loads are expensive. This doesn't fix the problem for targets that don't define _GLIBCXX_ATOMIC_BUILTINS but I don't know how many of them there are. We could make it work on more targets by adding a new configure check just for __atomic_load(int*, ...), because _GLIBCXX_ATOMIC_BUILTINS requires several builtins to support various object sizes, but here we don't need all of that. >The race was detected with ThreadSanitizer on the following program: > >#define _GLIBCXX_USE_CXX11_ABI 0 >#include >#include >#include >#include > >int main() { > std::string s = "foo"; > std::thread t([=](){ > std::string x = s; > std::cout << &x << std::endl; > }); > std::this_thread::sleep_for(std::chrono::seconds(1)); > std::cout << &s[0] << std::endl; > t.join(); >} > >$ g++ string.cc -std=c++11 -pthread -O1 -g -fsanitize=thread >$ ./a.out > >WARNING: ThreadSanitizer: data race (pid=98135) > Read of size 4 at 0x7d080000efd0 by main thread: > #0 std::string::_Rep::_M_is_leaked() const >include/c++/6.0.0/bits/basic_string.h:2605 (a.out+0x000000401d7c) > #1 std::string::_M_leak() >include/c++/6.0.0/bits/basic_string.h:2730 (a.out+0x000000401d7c) > #2 std::string::operator[](unsigned long) >include/c++/6.0.0/bits/basic_string.h:3274 (a.out+0x000000401d7c) > #3 main /tmp/string.cc:14 (a.out+0x000000401d7c) > > Previous atomic write of size 4 at 0x7d080000efd0 by thread T1: > #0 __tsan_atomic32_fetch_add >../../../../libsanitizer/tsan/tsan_interface_atomic.cc:611 >(libtsan.so.0+0x00000005811f) > #1 __exchange_and_add include/c++/6.0.0/ext/atomicity.h:59 >(a.out+0x000000401a19) > #2 __exchange_and_add_dispatch >include/c++/6.0.0/ext/atomicity.h:92 (a.out+0x000000401a19) > #3 std::string::_Rep::_M_dispose(std::allocator const&) >include/c++/6.0.0/bits/basic_string.h:2659 (a.out+0x000000401a19) > #4 std::basic_string, >std::allocator >::~basic_string() >include/c++/6.0.0/bits/basic_string.h:2961 (a.out+0x000000401a19) > #5 ~ /tmp/string.cc:9 (a.out+0x000000401a19) > #6 ~_Head_base include/c++/6.0.0/tuple:102 (a.out+0x000000401a19) > #7 ~_Tuple_impl include/c++/6.0.0/tuple:338 (a.out+0x000000401a19) > #8 ~tuple include/c++/6.0.0/tuple:521 (a.out+0x000000401a19) > #9 ~_Bind_simple include/c++/6.0.0/functional:1503 (a.out+0x000000401a19) > #10 ~_Impl include/c++/6.0.0/thread:107 (a.out+0x000000401a19) > #11 destroy()> >> > include/c++/6.0.0/ext/new_allocator.h:124 (a.out+0x0000004015c7) > #12 _S_destroy()> >> >, std::thread::_Impl()> > > >include/c++/6.0.0/bits/alloc_traits.h:236 (a.out+0x0000004015c7) > #13 destroy()> >> > include/c++/6.0.0/bits/alloc_traits.h:336 (a.out+0x0000004015c7) > #14 _M_dispose include/c++/6.0.0/bits/shared_ptr_base.h:529 >(a.out+0x0000004015c7) > #15 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() >/usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/shared_ptr_base.h:150 >(libstdc++.so.6+0x0000000b6da4) > #16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() >/usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/shared_ptr_base.h:662 >(libstdc++.so.6+0x0000000b6da4) > #17 std::__shared_ptr(__gnu_cxx::_Lock_policy)2>::~__shared_ptr() >/usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/shared_ptr_base.h:928 >(libstdc++.so.6+0x0000000b6da4) > #18 std::shared_ptr::~shared_ptr() >/usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/shared_ptr.h:93 >(libstdc++.so.6+0x0000000b6da4) > #19 execute_native_thread_routine >libstdc++-v3/src/c++11/thread.cc:79 (libstdc++.so.6+0x0000000b6da4) > > Location is heap block of size 28 at 0x7d080000efc0 allocated by main thread: > #0 operator new(unsigned long) >../../../../libsanitizer/tsan/tsan_interceptors.cc:571 >(libtsan.so.0+0x0000000255a3) > #1 __gnu_cxx::new_allocator::allocate(unsigned long, void >const*) /usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:104 >(libstdc++.so.6+0x0000000cbaa8) > #2 std::string::_Rep::_S_create(unsigned long, unsigned long, >std::allocator const&) >/usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:1051 >(libstdc++.so.6+0x0000000cbaa8) > #3 __libc_start_main (libc.so.6+0x000000021ec4) > > Thread T1 (tid=98137, finished) created by main thread at: > #0 pthread_create >../../../../libsanitizer/tsan/tsan_interceptors.cc:895 >(libtsan.so.0+0x000000026d04) > #1 __gthread_create >/usr/local/google/home/dvyukov/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:662 >(libstdc++.so.6+0x0000000b6e52) > #2 std::thread::_M_start_thread(std::shared_ptr, >void (*)()) libstdc++-v3/src/c++11/thread.cc:149 >(libstdc++.so.6+0x0000000b6e52) > #3 __libc_start_main (libc.so.6+0x000000021ec4) > >OK for trunk? >Index: include/bits/basic_string.h >=================================================================== >--- include/bits/basic_string.h (revision 227363) >+++ include/bits/basic_string.h (working copy) >@@ -2601,11 +2601,11 @@ > > bool > _M_is_leaked() const _GLIBCXX_NOEXCEPT >- { return this->_M_refcount < 0; } >+ { return __gnu_cxx::__atomic_load_dispatch(&this->_M_refcount) < 0; } > > bool > _M_is_shared() const _GLIBCXX_NOEXCEPT >- { return this->_M_refcount > 0; } >+ { return __gnu_cxx::__atomic_load_dispatch(&this->_M_refcount) > 0; } > > void > _M_set_leaked() _GLIBCXX_NOEXCEPT >Index: include/ext/atomicity.h >=================================================================== >--- include/ext/atomicity.h (revision 227363) >+++ include/ext/atomicity.h (working copy) >@@ -35,6 +35,16 @@ > #include > #include > >+// Even if the CPU doesn't need a memory barrier, we need to ensure >+// that the compiler doesn't reorder memory accesses across the >+// barriers. >+#ifndef _GLIBCXX_READ_MEM_BARRIER >+#define _GLIBCXX_READ_MEM_BARRIER __atomic_thread_fence (__ATOMIC_ACQUIRE) >+#endif >+#ifndef _GLIBCXX_WRITE_MEM_BARRIER >+#define _GLIBCXX_WRITE_MEM_BARRIER __atomic_thread_fence (__ATOMIC_RELEASE) >+#endif >+ > namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) > { > _GLIBCXX_BEGIN_NAMESPACE_VERSION >@@ -50,7 +60,7 @@ > > static inline void > __atomic_add(volatile _Atomic_word* __mem, int __val) >- { __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); } >+ { __atomic_fetch_add(__mem, __val, __ATOMIC_RELEASE); } > #else > _Atomic_word > __attribute__ ((__unused__)) >@@ -101,17 +111,27 @@ > #endif > } > >+ static inline _Atomic_word >+ __attribute__ ((__unused__)) >+ __atomic_load_dispatch(const _Atomic_word* __mem) >+ { >+#ifdef __GTHREADS >+ if (__gthread_active_p()) >+ { >+#ifdef _GLIBCXX_ATOMIC_BUILTINS >+ return __atomic_load_n(__mem, __ATOMIC_ACQUIRE); >+#else >+ // The best we can get with an old compiler. We don't support old compilers in libstdc++ trunk, so this comment is misleading. The fallback is needed for targets without support for all the builtins, not for old compilers. >+ _Atomic_word v = *(volatile _Atomic_word*)__mem; >+ _GLIBCXX_READ_MEM_BARRIER; If a target doesn't define _GLIBCXX_ATOMIC_BUILTINS do we know it will define __atomic_thread_fence ? I guess those targets should be redefining _GLIBCXX_READ_MEM_BARRIER anyway. >+ return v; >+#endif >+ } >+#endif >+ return *__mem; >+ } >+ > _GLIBCXX_END_NAMESPACE_VERSION > } // namespace > >-// Even if the CPU doesn't need a memory barrier, we need to ensure >-// that the compiler doesn't reorder memory accesses across the >-// barriers. >-#ifndef _GLIBCXX_READ_MEM_BARRIER >-#define _GLIBCXX_READ_MEM_BARRIER __atomic_thread_fence (__ATOMIC_ACQUIRE) >-#endif >-#ifndef _GLIBCXX_WRITE_MEM_BARRIER >-#define _GLIBCXX_WRITE_MEM_BARRIER __atomic_thread_fence (__ATOMIC_RELEASE) >-#endif >- > #endif