From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <elfutils-devel-return-7885-listarch-elfutils-devel=sourceware.org@sourceware.org>
Received: (qmail 56761 invoked by alias); 8 Nov 2019 15:29:02 -0000
Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <elfutils-devel.sourceware.org>
List-Post: <mailto:elfutils-devel@sourceware.org>
List-Help: <mailto:elfutils-devel-help@sourceware.org>
List-Subscribe: <mailto:elfutils-devel-subscribe@sourceware.org>
Sender: elfutils-devel-owner@sourceware.org
Received: (qmail 56741 invoked by uid 89); 8 Nov 2019 15:29:02 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=they'll, theyll, H*c:alternative
X-Spam-Status: No, score=-11.1 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org
X-Spam-Level:
X-HELO: mx0a-0010f301.pphosted.com
Received: from mx0a-0010f301.pphosted.com (HELO mx0a-0010f301.pphosted.com) (148.163.149.254) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Nov 2019 15:29:00 +0000
Received: from pps.filterd (m0102856.ppops.net [127.0.0.1])	by mx0b-0010f301.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id xA8FQcdU017908;	Fri, 8 Nov 2019 09:28:56 -0600
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice.edu; h=date : from : subject : to : cc : message-id : in-reply-to : references : mime-version : content-type; s=ricemail; bh=vX4/fmYQIKGUQGX0AIIFr1mLH0XdR20j8eRO39DxkiE=; b=AjOVPOmKvxt5cjm5gqhpr12h/bIEvNzOGTAJbQi6bltAIEdPtub7Scua8EnZzyMuSNCt CkLD6/HMfLdZQxf2AXGzuTbnZDAg+n78Ci0eJab2sydiSQokIk1zxX9H0qV/SAKxD5mB 85ufpgn9XRNwfmKt5fL/9RBDaIu4fD9vWlinPaaZd4yKBYg4xZOMKrUhkmNVOvzg0eWF Ms6HvKsO5EDby1qD1iXewuvc10U/TNZ4IhcthXlC/ZkcSdJn4x53qVk1iqLqJKgyX0qu RivGxtdYNITrXhFEXVEWgotRir1iSiKCvMC873DjgPGpDEC0EmeKqriqzEnhC6CmQJxJ gQ==
Received: from mh10.mail.rice.edu (mh10.mail.rice.edu [128.42.201.30])	by mx0b-0010f301.pphosted.com with ESMTP id 2w41uktwpp-1	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);	Fri, 08 Nov 2019 09:28:55 -0600
Received-X: from mh10.mail.rice.edu (localhost.localdomain [127.0.0.1])	by mh10.mail.rice.edu (Postfix) with ESMTP id BE97B6176C;	Fri,  8 Nov 2019 09:28:54 -0600 (CST)
Received-X: from mh10.mail.rice.edu (localhost.localdomain [127.0.0.1])	by mh10.mail.rice.edu (Postfix) with ESMTP id BCCEA61723;	Fri,  8 Nov 2019 09:28:54 -0600 (CST)
X-Virus-Scanned: by amavis-2.7.0 at mh10.mail.rice.edu, auth channel
Received-X: from mh10.mail.rice.edu ([127.0.0.1])	by mh10.mail.rice.edu (mh10.mail.rice.edu [127.0.0.1]) (amavis, port 10026)	with ESMTP id yDL35ZvkQVNF; Fri,  8 Nov 2019 09:28:54 -0600 (CST)
Received: from cslinux29.cs.rice.edu (cslinux29.cs.rice.edu [168.7.116.104])	(using TLSv1.2 with cipher AES256-SHA (256/256 bits))	(No client certificate requested)	(Authenticated sender: jma14)	by mh10.mail.rice.edu (Postfix) with ESMTPSA id 7D12D616BF;	Fri,  8 Nov 2019 09:28:54 -0600 (CST)
Date: Fri, 08 Nov 2019 15:29:00 -0000
From: Jonathon Anderson <jma14@rice.edu>
Subject: Re: [PATCH] lib + libdw: Add and use a concurrent version of the dynamic-size hash table.
To: Mark Wielaard <mark@klomp.org>
Cc: elfutils-devel@sourceware.org, Srdan Milakovic <sm108@rice.edu>
Message-Id: <1573226934.2134.0@rice.edu>
In-Reply-To: <e762f5d48fe7c70a924ac919239902188309a30f.camel@klomp.org>
References: <e6ae6fdcad2fe12e33cfa838a020108051f73d31.camel@klomp.org>	<20191104163932.21891-1-jma14@rice.edu>	<27820824019c4966ca81c7d65d6c68581633aa60.camel@klomp.org>	<1573140272.2173.0@rice.edu>	<e762f5d48fe7c70a924ac919239902188309a30f.camel@klomp.org>
X-Mailer: geary/3.34.1
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,18.0.572 definitions=2019-11-08_05:2019-11-08,2019-11-08 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 suspectscore=0 phishscore=0 malwarescore=0 priorityscore=1501 adultscore=0 spamscore=0 bulkscore=0 mlxscore=0 impostorscore=0 mlxlogscore=450 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1910280000 definitions=main-1911080154
Content-Type: text/plain; charset=us-ascii; format=flowed
X-SW-Source: 2019-q4/txt/msg00115.txt.bz2


On Fri, Nov 8, 2019 at 15:07, Mark Wielaard <mark@klomp.org> wrote:
> Hi,
> 
> On Thu, 2019-11-07 at 09:24 -0600, Jonathon Anderson wrote:
>>  On Thu, Nov 7, 2019 at 12:07, Mark Wielaard <mark@klomp.org 
>> <mailto:mark@klomp.org>> wrote:
>>  > Looking at the difference between the previous version and this 
>> one,
>>  > it
>>  > incorporates my simplification of FIND and lookup functions. And 
>> fixes
>>  > it by making insert_helper consistently return -1 when the value 
>> was
>>  > already inserted (hash == hval). And it fixes an issue where we 
>> were
>>  > using the the table entry val_ptr instead of the hashval as hash 
>> (was
>>  > that the typo? it didn't look harmless).
>> 
>>  Yep, those are the changes (plus the Sig8 patch). That typo was
>>  harmless because hash would be overwritten before its next use (or 
>> just
>>  unused), now with the (hash == hval) clause its always read so the 
>> typo
>>  is fixed.
> 
> Thanks for explaining. I have now finally pushed this to master.
> 
>>  Regarding the Sig8 table: I took a close look, and at the moment its
>>  use is in an area that isn't thread-safe anyway (in particular,
>>  __libdw_intern_next_unit). Depending on how that is parallelized 
>> there
>>  might be an issue (if its just wrapped with a separate mutex a 
>> thread
>>  might "miss" a CU if its not already in the table), but since that
>>  region would need inspection at that time anyway I'm fine with 
>> either
>>  option.
> 
> I still kept the code to handle the Sig8 table with the new 
> concurrent-
> safe code, since I think it is better if we use the new code always
> (even in the single threaded case).
> 
> So to fix this we do need some mutex to protect the binary search tree
> when calling tsearch/tfind? Or do you see other issues too?

The search tree can be handled with a mutex, the issue is with 
next_{tu,cu}_offset and the general logic of the function. As an 
example: suppose two threads look up in the Sig8 for A and see that its 
not currently present. They'll both use __libdw_intern_next_unit to 
load CUs (or units, I suppose) until they either find it or run out of 
units.

If the entirety of intern_next_unit is wrapped in a mutex, one of the 
two will load in the missing entry and finish, while the other has 
"missed" it and will keep going until no units remain. The easy 
solution is to have the threads check the table again on next_unit 
failure for whether the entry has been added, but that incurs a 
large-ish overhead for the constant reprobing. The easiest way around 
that is to add an interface on the Sig8 table that returns a "probe" on 
lookup failure that can be continued with only a handful of atomics 
(essentially saving state from the first find). The downside to this 
approach is that unit parsing is fully serialized.

If the next_*_offset is handled with a separate mutex or as an atomic 
(say, using fetch_add), the same issue occurs but without the mutex 
there's no guarantee that another thread isn't currently parsing and 
will write the entry soon, so the easy solution doesn't work. Since the 
Sig8 key is only known after the parsing is complete, we can't even 
insert a "in progress" entry. One solution is to allow for duplicate 
parsing (but then next_*_offset would have to be updated *after* 
Sig8_Hash_insert), another is to use a condition variable on whether 
all the units have been parsed (so threads that don't find what they're 
looking for would block until its certain that it doesn't exist).

Both are viable directions, but neither are trivial.

> 
>>  This isn't an issue for Dwarf_Abbrev, the worst that can happen 
>> there
>>  is a duplicate alloc and parsing (as long as the DWARF doesn't have
>>  erroneous abbrev entries, if it does we would need to thread-safe 
>> the
>>  error handling too).
> 
> Unfortunately we don't always control the data, so bad abbrev entries
> could happen. The extra alloc wouldn't really "leak" because it would
> be freed with the Dwarf. So I am not too concerned about that. Is that
> the worse that can happen in __libdw_getabbrev? When we goto invalid
> the Dwarf_Abbrev would indeed "leak", but it isn't really lost, it 
> will
> get cleaned up when the Dwarf is destroyed.

It wouldn't "leak," but it would be taking up space until the 
dwarf_end. Not that I mind (they're really small).

I'm thinking more of the case where the Abbrev_insert returns -1 (entry 
collision), in that case the new Abbrev would stick around until the 
dwarf_end.

> 
> Thanks,
> 
> Makr