From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 34707 invoked by alias); 23 Aug 2019 22:36:50 -0000 Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: elfutils-devel-owner@sourceware.org Received: (qmail 34677 invoked by uid 89); 23 Aug 2019 22:36:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-13.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=HTo:U*mark X-Spam-Status: No, score=-13.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: mx0b-0010f301.pphosted.com Received: from mx0b-0010f301.pphosted.com (HELO mx0b-0010f301.pphosted.com) (148.163.153.244) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 23 Aug 2019 22:36:46 +0000 Received: from pps.filterd (m0102859.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x7NMagew032740; Fri, 23 Aug 2019 17:36:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice.edu; h=date : from : subject : to : cc : message-id : in-reply-to : references : mime-version : content-type; s=ricemail; bh=Qh5yPkCXBAccejSFB26yKF6akKBFE24ipHzIWQb3oNk=; b=AXaC53BJrV3SnXqBhgWpvCPX0XY2W2rlQMpZZLnDiub4iZOUpKCHZXZ8y8Gl+qGN6Q40 +Jrsyq0Q59Zla/ZxX4KdGPqYK30+P4oizUt1bmiIftIIKV33X8sCJEPUECnnj1wl/5UX KwTvOcQotuHq3r0NNkz8HWJuMaWpfBbFpeDV8xikQjoaNu+Q5R99Sp6rDY09ZObUx6P9 sAhW+lJ7M8Y/qVgjldRScbQtc9FQ4TQwfADwo4X7JVrl1AX8d5N0dye83G6UyGhThdnP U53udLFe9DI06dyonyEiGIrqH1tuvUvh19T0hagn9oIZ0Zt6CymN0U5Wds5pyYG5Cevk dg== Received: from mh2.mail.rice.edu (mh2.mail.rice.edu [128.42.201.21]) by mx0b-0010f301.pphosted.com with ESMTP id 2ugu6dkxpb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Aug 2019 17:36:42 -0500 Received-X: from mh2.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh2.mail.rice.edu (Postfix) with ESMTP id D9735500405; Fri, 23 Aug 2019 17:36:41 -0500 (CDT) Received-X: from mh2.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh2.mail.rice.edu (Postfix) with ESMTP id D666F500400; Fri, 23 Aug 2019 17:36:41 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh2.mail.rice.edu, auth channel Received-X: from mh2.mail.rice.edu ([127.0.0.1]) by mh2.mail.rice.edu (mh2.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 20DwL7ZB0Ku2; Fri, 23 Aug 2019 17:36:41 -0500 (CDT) Received: from cslinux29.cs.rice.edu (cslinux29.cs.rice.edu [168.7.116.104]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jma14) by mh2.mail.rice.edu (Postfix) with ESMTPSA id A4E1B50018A; Fri, 23 Aug 2019 17:36:41 -0500 (CDT) Date: Fri, 23 Aug 2019 22:36:00 -0000 From: Jonathon Anderson Subject: Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev() To: Mark Wielaard Cc: elfutils-devel@sourceware.org, Srdan Milakovic Message-Id: <1566599801.1914.0@smtp.mail.rice.edu> In-Reply-To: <5b516eec8319b9fed0be05595ca0ba06746aba7e.camel@klomp.org> References: <1565983469.1826.0@smtp.mail.rice.edu> <410becd068420cbd3b93c161be2084c6a5d2f362.camel@klomp.org> <1566396518.5389.0@smtp.mail.rice.edu> <5b516eec8319b9fed0be05595ca0ba06746aba7e.camel@klomp.org> X-Mailer: geary/0.12.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:5.22.84,1.0.8 definitions=2019-08-23_10:2019-08-21,2019-08-23 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 impostorscore=0 priorityscore=1501 clxscore=1015 suspectscore=0 adultscore=0 phishscore=0 mlxscore=0 lowpriorityscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1906280000 definitions=main-1908230212 X-SW-Source: 2019-q3/txt/msg00130.txt.bz2 On Fri, Aug 23, 2019 at 1:25 PM, Mark Wielaard wrote: > Hi, > > On Wed, 2019-08-21 at 09:08 -0500, Jonathon Anderson wrote: >> On Wed, Aug 21, 2019 at 6:16 AM, Mark Wielaard >> wrote:On Fri, 2019-08-16 at 14:24 -0500, Jonathon Anderson wrote: >> > > For parallel applications that need the information in the >> DIEs, the >> > > Dwarf_Abbrev hash table et al. become a massive data race. This >> > > fixes >> > > that by: >> > > >> > > 1. Adding atomics & locks to the hash table to manage >> concurrency >> > > (lib/dynamicsizehash_concurrent.{c,h}) >> > > 2. Adding a lock & array structure to the memory manager >> > > (pseudo-TLS) >> > > (libdwP.h, libdw_alloc.c) >> > > 3. Adding extra configure options for Helgrind/DRD annotations >> > > (configure.ac) >> > > 4. Including "stdatomic.h" from FreeBSD, to support C11-style >> > > atomics. >> > > (lib/stdatomic.h) >> > >> > This looks like really nice work. Thanks! >> > >> > I am splitting review in some smaller parts if you don't mind. >> > Simply because it is large and I cannot keep everything in my >> head at >> > once :) > > BTW. I would prefer to handle this as 4 separate additions, probably > in > this order: > > 1) configure stuff for valgrind annotations. > 2) add support for stdatomic.h functions. > 3) thread-safe obstack memory handling > 4) concurrent dynamic hash table. Sure thing, I can split the patch into bits over the weekend. I may take your advice and just use git request-pull though. > >> > If the compiler provides stdatomic.h then I think it would be good >> > to >> > use that instead of our own implementation. The copyright isn't a >> > problem. But do you have a reference/URL to the upstream version? >> I >> > like to add that somewhere, so we can sync with it in the future. >> I >> > see >> > various commented out parts. Was that already upstream? Should we >> just >> > remove those parts? >> >> It would definitely be preferable to use the compiler's >> implementation >> if possible, we used this in case GCC 4.7 and 4.8 (RHEL7) >> compatibility >> was needed. If those versions are old enough the file can be removed >> entirely. >> >> The upstream is at >> https://github.com/freebsd/freebsd/blob/master/sys/sys/stdatomic.h, >> although the version here appears to be slightly modified (we used >> the >> version that HPCToolkit ships). The components we use don't seem >> affected, so a resync shouldn't make a difference. > > OK, then we should come up with some kind of configure test to see if > we can use the standard stdatomic.h and otherwise use our own. I am > surprised I cannot find other projects doing this. Would be nice to > "steal" something standard for this. At least OpenSSL does it: https://github.com/openssl/openssl/blob/master/include/internal/refcount.h, the interesting note being that it has a series of fallbacks (various compiler builtins and then locks). The other projects I skimmed just have the fallbacks and don't check for C11, given that Elfutils only supports GCC that might be a valid (and more compact) approach. > >> > > - Currently the concurrent hash table is always enabled, >> > > performance-wise there is no known difference between it >> > > and the non-concurrent version. >> > > This can be changed to toggle with --enable-thread-safety >> > > if preferred. >> > >> > I would prefer it always enabled, unless there is a massive >> slowdown >> > of >> > the single-threaded case. The problem with --enable-thread-safety >> is >> > that it is a) known broken (sigh) and b) it basically introduces >> two >> > separate libraries that behave subtly differently. I would very >> much >> > like to get rid of --enable-thread-safety by fixing the broken >> locking >> > and simply making it the default. >> >> I haven't noticed any slowdown in the single-threaded case, >> although I >> haven't stressed it hard enough to find out for certain. From a >> theoretical standpoint it shouldn't, atomics (with the proper memory >> orders) are usually (on x86 at least) as cheap as normal accesses >> when >> used by a single thread, and the algorithm is otherwise effectively >> the >> same as the original hash table. >> >> How difficult would it be to fix the locking (or rather, what's >> "broken")? We would definitely benefit from having thread-safety at >> least for getters, which would only need locks around the internal >> management. > > To be honest I don't know how badly it is broken. > It is only implemented for libelf. > If you configure --enable-thread-safety and make check you will see > several tests fail because they abort with Unexpected error: Resource > deadlock avoided. > > I think it is mainly that nobody maintained the locks and now some are > just wrongly placed. Ideally we switch --enable-thread-safety on by > default, identify which locks are wrongly placed, run all tests with > valgrind/hellgrind and fix any issues found. > > It really has not been a priority. Sorry. No worries, its not a priority on our end either. Elfutils' codebase is significantly simpler (IMHO) than ours, so if it ever comes up we'll just submit another patch. > >> > > - Another implementation of #2 above might use dynamic TLS >> > > (pthread_key_*), >> > > we chose this implementation to reduce the overall >> complexity. >> > >> > Are there any other trade-offs? >> >> If the application spawns N threads that all use a Dwarf object >> (same >> or different) enough to cause an allocation, and then joins them >> all, >> any Dwarf objects allocated after will allocate N unusable slots in >> the >> mem_tails array. Thus long-running applications (that don't use >> thread >> pools) would start experiencing effects similar to a memory leak, >> of 1 >> pointer's worth (8 bytes on 64-bit) per dead thread. >> >> The alternative is to use dynamic TLS so that only threads that are >> currently active use the extra memory, assuming libpthread is >> sufficiently proactive about reclaiming unused key values. I think >> if >> we assume `dwarf_end` happens-after any memory management (which >> would >> make sense for a destructor), there should be a simple atomic >> pattern >> to handle the freeing, but I would need to sit down for a while to >> work >> out a sufficient proof. >> >> I was also under the impression that dynamic TLS was particularly >> expensive performance-wise, although I haven't experimented with it >> myself enough to know. The compiler can be a little smarter about >> static TLS, and the result is easier to reason about, which is why >> we >> chose this implementation for initial patch. If the alternative >> would >> be preferable we can change it. > > I thought a bit about this one and although I am slightly worried > about > the possible indefinite growing of the thread_ids, I don't think the > "memory leak" is an issue. Concurrent usage of the Dwarf object > already > costs a bit more memory (since each thread gets its own memory block > if > they allocate at the same time) which is probably larger than any > extra > created by reserving space for all possible thread_ids. This is only > really a problem for a program that doesn't use thread pools and keeps > opening and concurrently accessing new Dwarf objects (because at a > certain point the whole Dwarf will have been read and no new > allocations happen). Although it would be nice if we could somehow > reset the next_id to zero in dwarf_end (), when this is the last > thread > or Dwarf object. The memory overhead is a little worse than that (each thread allocates its own memory blocks, period), but that would be present in both implementations. I can't think of a simple way to increase the memory efficiency past that (although I can think of some ridiculously complex ways). I suppose it would be possible to use a sort of free-list for the IDs, although that requires a hook at thread exit (doable with PThreads, not with C11) and cleaning up would be a bit of a nightmare. At some point dynamic TLS is more robust against weird situations (and, if my thought-proof is correct, simpler). > > Cheers, > > Mark