From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31535 invoked by alias); 19 Nov 2004 20:21:59 -0000 Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org Received: (qmail 31477 invoked from network); 19 Nov 2004 20:21:47 -0000 Received: from unknown (HELO mailhost.cas.org) (134.243.50.9) by sourceware.org with SMTP; 19 Nov 2004 20:21:47 -0000 Received: from cas.org (pmd25awu [134.243.216.44]) by mailhost.cas.org (8.12.10/8.12.10/CAS_MAIL_HUB-4.7) with ESMTP id iAJKLhN9004379; Fri, 19 Nov 2004 15:21:46 -0500 (EST) (envelope-from pdubuc@cas.org) Message-ID: <419E55D6.6030803@cas.org> Date: Fri, 19 Nov 2004 20:21:00 -0000 From: Paul Dubuc Organization: Chemical Abstracts Service (CAS) User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0 MIME-Version: 1.0 Newsgroups: gnu.g++.help,gnu.g++,gnu.g++.bug To: libstdc++@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-help@gcc.gnu.org Subject: Subtle MT problem with __gnu_cxx::hash_map Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2004-11/txt/msg00131.txt.bz2 There's a subtle thread safety problem with hash_map that was found in our testing recently. It's understood that operator[] is a non-const method since it can insert an element into a container if one is not found with the given index. In our case we were using operator[] to access a hash_map that had been fully populated. Each index we were using did have an entry in the hash_map. So we were accessing elements in the map with operator[] using multiple threads thinking that this would be a thread safe, const operation. This is implied by the statement "simultaneous read accesses to to shared containers are safe" in the SGI STL user's guide (http://www.sgi.com/tech/stl/thread_safety.html). This worked for a long time until we tried it on a hash_map with 389 elements. The program crashed. A search through the source found that 389 is one of the size values as which a hash_map will expand: In ext/stl_hashtable.h (for GCC 3.3.4, ext/hashtable.h for GCC 3.4.3): static const unsigned long __stl_prime_list[__stl_num_primes] = { 53ul, 97ul, 193ul, 389ul, 769ul, 1543ul, 3079ul, 6151ul, 12289ul, 24593ul, 49157ul, 98317ul, 196613ul, 393241ul, 786433ul, 1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul, 50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul, 1610612741ul, 3221225473ul, 4294967291ul }; Sure enough the problem happens with hash_tables of other sizes in this table. The find_or_insert() method in stl_hashtable.h seems to call resize() before it really needs to. The problem could be fixed by moving the call to resize to be after the first return statement: template typename hashtable<_Val,_Key,_HF,_Ex,_Eq,_All>::reference hashtable<_Val,_Key,_HF,_Ex,_Eq,_All>::find_or_insert(const value_type& __obj) { resize(_M_num_elements + 1); // <- FROM HERE size_type __n = _M_bkt_num(__obj); _Node* __first = _M_buckets[__n]; for (_Node* __cur = __first; __cur; __cur = __cur->_M_next) if (_M_equals(_M_get_key(__cur->_M_val), _M_get_key(__obj))) return __cur->_M_val; // <-- TO HERE _Node* __tmp = _M_new_node(__obj); __tmp->_M_next = __first; _M_buckets[__n] = __tmp; ++_M_num_elements; return __tmp->_M_val; } One could argue that find() should always used to access the hash_map, but using operator[] is often more succinct and readable in complex code. The problem is easy to fall into, difficult to diagnose, hard to reproduce (unless you know the magic prime numbers) and apparently easy to fix (unless I'm missing something). Could this be changed in the next release of the library? It might fix or prevent some time bombs out in the field. Thanks, Paul Dubuc