From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 58368 invoked by alias); 7 Nov 2019 18:40:39 -0000 Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: elfutils-devel-owner@sourceware.org Received: (qmail 58354 invoked by uid 89); 7 Nov 2019 18:40:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-23.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy= X-Spam-Status: No, score=-23.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: mx0b-0010f301.pphosted.com Received: from mx0b-0010f301.pphosted.com (HELO mx0b-0010f301.pphosted.com) (148.163.153.244) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 07 Nov 2019 18:40:35 +0000 Received: from pps.filterd (m0102858.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id xA7Ia2Ia023999; Thu, 7 Nov 2019 12:40:31 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice.edu; h=date : from : subject : to : cc : message-id : in-reply-to : references : mime-version : content-type; s=ricemail; bh=HOu9aWA6nsYV52i1mYemEiiD2AhPnLMFKxPb4RQ0KqQ=; b=BA6Z9vzDuWy0STJ8a4Q4Do2CCk1dmpw+mwXkRlPcHu1NN4lcThqrz2f3v+7yygwpNQZb 1IwUBqW+5NgDrG4DpQ7F1PG7w6wNXb6pXBP8UuzYKGq3qmzxN7q64LGudxdduF/IS7MC 05sIe1Zib2klDJoq94x2JLit/48kDq4+h4TYneh/KomPDmuuNmqzJuT8J/XEC0lvwh/p fp6LrVpW0JcZeiX/QWaqhnagQTtnGvx1Y2wFuCxHmwJE/H22/bmFuZT5wu5ypSA+R2tc ySIhrlopIGJCDKHpaeG+U0k+vQ4wgmGeGhPMnvRgWe6lHEZiGlxX3eANKeTX2P4q+V2Y bg== Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx0b-0010f301.pphosted.com with ESMTP id 2w41un1rye-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 07 Nov 2019 12:40:31 -0600 Received-X: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 99018404E2; Thu, 7 Nov 2019 12:40:30 -0600 (CST) Received-X: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 9772C404C5; Thu, 7 Nov 2019 12:40:30 -0600 (CST) X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received-X: from mh3.mail.rice.edu ([127.0.0.1]) by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id ugKMMGciB2tU; Thu, 7 Nov 2019 12:40:30 -0600 (CST) Received: from cslinux29.cs.rice.edu (cslinux29.cs.rice.edu [168.7.116.104]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jma14) by mh3.mail.rice.edu (Postfix) with ESMTPSA id 4FAC2401FF; Thu, 7 Nov 2019 12:40:30 -0600 (CST) Date: Thu, 07 Nov 2019 18:40:00 -0000 From: Jonathon Anderson Subject: Re: [PATCH 2/2] libdw: Rewrite the memory handler to be more robust. To: Mark Wielaard Cc: elfutils-devel@sourceware.org Message-Id: <1573152030.2173.2@rice.edu> In-Reply-To: <93a4d8983b6ec43c09c6c3b3f6ed8d358321bb9d.camel@klomp.org> References: <1572380520.19948.0@rice.edu> <20191029211437.3268-1-mark@klomp.org> <20191029211437.3268-2-mark@klomp.org> <93a4d8983b6ec43c09c6c3b3f6ed8d358321bb9d.camel@klomp.org> X-Mailer: geary/3.34.1 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,18.0.572 definitions=2019-11-07_05:2019-11-07,2019-11-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 phishscore=0 bulkscore=0 clxscore=1015 impostorscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 malwarescore=0 mlxscore=0 mlxlogscore=606 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1910280000 definitions=main-1911070174 Content-Type: text/plain; charset=us-ascii; format=flowed X-SW-Source: 2019-q4/txt/msg00112.txt.bz2 On Thu, Nov 7, 2019 at 18:20, Mark Wielaard wrote: > Hi Jonathan, > > On Tue, 2019-10-29 at 22:14 +0100, Mark Wielaard wrote: >> Pthread's thread-local variables are highly limited, which makes >> it difficult to use many Dwarfs. This replaces that with a >> less efficient (or elegant) but more robust method. > > Thanks, it looks good (similar to a variant I already reviewed > earlier). Some small comments below. > > I haven't benchmarked this version. Did you do any? I haven't benchmarked this version, but I did benchmark the equivalent earlier version (this version is almost quite literally a rebase of the other). I don't have the exact results on hand, what I remember is that the pthread_key method was faster (and handled the many-thread case better), by maybe a factor of 1.5x-2x in parallel. In serial the overhead was minimal (just an extra pointer indirection on allocations). The fastest (I think) would be the "cloned" Dwarf idea, but that requires an overhaul. > >> diff --git a/lib/atomics.h b/lib/atomics.h >> index ffd12f87..e3773759 100644 >> --- a/lib/atomics.h >> +++ b/lib/atomics.h >> @@ -31,7 +31,9 @@ >> #if HAVE_STDATOMIC_H >> /* If possible, use the compiler's preferred atomics. */ >> # include >> +# include >> #else >> /* Otherwise, try to use the builtins provided by this compiler. >> */ >> # include "stdatomic-fbsd.h" >> +# define thread_local __thread >> #endif /* HAVE_STDATOMIC_H */ > > Do we really need this? > We already use __thread unconditionally in the rest of the code. > The usage of threads.h seems to imply we actually want C11 > _Thread_local. Is that what you really want, or can we just use > __thread in libdw_alloc.c for thread_id? > >> diff --git a/libdw/dwarf_begin_elf.c b/libdw/dwarf_begin_elf.c >> index 8d137414..eadff13b 100644 >> --- a/libdw/dwarf_begin_elf.c >> +++ b/libdw/dwarf_begin_elf.c >> @@ -418,13 +418,14 @@ dwarf_begin_elf (Elf *elf, Dwarf_Cmd cmd, >> Elf_Scn *scngrp) >> actual allocation. */ >> result->mem_default_size = mem_default_size; >> result->oom_handler = __libdw_oom; >> - if (pthread_key_create (&result->mem_key, NULL) != 0) >> + if (pthread_rwlock_init(&result->mem_rwl, NULL) != 0) >> { >> free (result); >> - __libdw_seterrno (DWARF_E_NOMEM); /* no memory or max >> pthread keys. */ >> + __libdw_seterrno (DWARF_E_NOMEM); /* no memory. */ >> return NULL; >> } >> - atomic_init (&result->mem_tail, (uintptr_t)NULL); >> + result->mem_stacks = 0; >> + result->mem_tails = NULL; >> >> if (cmd == DWARF_C_READ || cmd == DWARF_C_RDWR) >> { > > OK. > >> diff --git a/libdw/dwarf_end.c b/libdw/dwarf_end.c >> index a2e94436..3fd2836d 100644 >> --- a/libdw/dwarf_end.c >> +++ b/libdw/dwarf_end.c >> @@ -95,17 +95,19 @@ dwarf_end (Dwarf *dwarf) >> tdestroy (dwarf->split_tree, noop_free); >> >> /* Free the internally allocated memory. */ >> - struct libdw_memblock *memp; >> - memp = (struct libdw_memblock *) (atomic_load_explicit >> - (&dwarf->mem_tail, >> - memory_order_relaxed)); >> - while (memp != NULL) >> - { >> - struct libdw_memblock *prevp = memp->prev; >> - free (memp); >> - memp = prevp; >> - } >> - pthread_key_delete (dwarf->mem_key); >> + for (size_t i = 0; i < dwarf->mem_stacks; i++) >> + { >> + struct libdw_memblock *memp = dwarf->mem_tails[i]; >> + while (memp != NULL) >> + { >> + struct libdw_memblock *prevp = memp->prev; >> + free (memp); >> + memp = prevp; >> + } >> + } >> + if (dwarf->mem_tails != NULL) >> + free (dwarf->mem_tails); >> + pthread_rwlock_destroy (&dwarf->mem_rwl); >> >> /* Free the pubnames helper structure. */ >> free (dwarf->pubnames_sets); > > OK. Might it be an idea to have some call to reset next_id (see > below)? > >> diff --git a/libdw/libdwP.h b/libdw/libdwP.h >> index ad2599eb..3e1ef59b 100644 >> --- a/libdw/libdwP.h >> +++ b/libdw/libdwP.h >> @@ -149,17 +149,6 @@ enum >> >> #include "dwarf_sig8_hash.h" >> >> -/* Structure for internal memory handling. This is basically a >> simplified >> - reimplementation of obstacks. Unfortunately the standard >> obstack >> - implementation is not usable in libraries. */ >> -struct libdw_memblock >> -{ >> - size_t size; >> - size_t remaining; >> - struct libdw_memblock *prev; >> - char mem[0]; >> -}; >> - >> /* This is the structure representing the debugging state. */ >> struct Dwarf >> { >> @@ -231,11 +220,22 @@ struct Dwarf >> /* Similar for addrx/constx, which will come from .debug_addr >> section. */ >> struct Dwarf_CU *fake_addr_cu; >> >> - /* Internal memory handling. Each thread allocates separately >> and only >> - allocates from its own blocks, while all the blocks are >> pushed atomically >> - onto a unified stack for easy deallocation. */ >> - pthread_key_t mem_key; >> - atomic_uintptr_t mem_tail; >> + /* Supporting lock for internal memory handling. Ensures >> threads that have >> + an entry in the mem_tails array are not disturbed by new >> threads doing >> + allocations for this Dwarf. */ >> + pthread_rwlock_t mem_rwl; >> + >> + /* Internal memory handling. This is basically a simplified >> thread-local >> + reimplementation of obstacks. Unfortunately the standard >> obstack >> + implementation is not usable in libraries. */ >> + size_t mem_stacks; >> + struct libdw_memblock >> + { >> + size_t size; >> + size_t remaining; >> + struct libdw_memblock *prev; >> + char mem[0]; >> + } **mem_tails; >> >> /* Default size of allocated memory blocks. */ >> size_t mem_default_size; > > OK. > >> @@ -578,34 +578,31 @@ libdw_valid_user_form (int form) >> extern void __libdw_seterrno (int value) internal_function; >> >> >> -/* Memory handling, the easy parts. This macro does not do nor >> need to do any >> - locking for proper concurrent operation. */ >> +/* Memory handling, the easy parts. */ >> #define libdw_alloc(dbg, type, tsize, cnt) \ >> - ({ struct libdw_memblock *_tail = pthread_getspecific >> (dbg->mem_key); \ >> - size_t _req = (tsize) * (cnt); \ >> - type *_result; \ >> - if (unlikely (_tail == NULL)) \ >> - _result = (type *) __libdw_allocate (dbg, _req, __alignof >> (type)); \ >> + ({ struct libdw_memblock *_tail = __libdw_alloc_tail(dbg); >> \ >> + size_t _required = (tsize) * (cnt); \ >> + type *_result = (type *) (_tail->mem + (_tail->size - >> _tail->remaining));\ >> + size_t _padding = ((__alignof (type) \ >> + - ((uintptr_t) _result & (__alignof (type) - 1))) \ >> + & (__alignof (type) - 1)); \ >> + if (unlikely (_tail->remaining < _required + _padding)) >> \ >> + _result = (type *) __libdw_allocate (dbg, _required, >> __alignof (type));\ >> else \ >> { \ >> - _result = (type *) (_tail->mem + (_tail->size - >> _tail->remaining)); \ >> - size_t _padding = ((__alignof (type) \ >> - - ((uintptr_t) _result & (__alignof (type) - 1))) \ >> - & (__alignof (type) - 1)); \ >> - if (unlikely (_tail->remaining < _req + _padding)) \ >> - _result = (type *) __libdw_allocate (dbg, _req, __alignof >> (type)); \ >> - else \ >> - { \ >> - _req += _padding; \ >> - _result = (type *) ((char *) _result + _padding); \ >> - _tail->remaining -= _req; \ >> - } \ >> + _required += _padding; \ >> + _result = (type *) ((char *) _result + _padding); \ >> + _tail->remaining -= _required; \ >> } \ >> _result; }) > > OK. Maybe add a comment that __libdw_alloc_tail makes sure that we get > a thread local libdw_memblock to work with. > >> #define libdw_typed_alloc(dbg, type) \ >> libdw_alloc (dbg, type, sizeof (type), 1) >> >> +/* Callback to choose a thread-local memory allocation stack. */ >> +extern struct libdw_memblock *__libdw_alloc_tail (Dwarf* dbg) >> + __nonnull_attribute__ (1); >> + >> /* Callback to allocate more. */ >> extern void *__libdw_allocate (Dwarf *dbg, size_t minsize, size_t >> align) >> __attribute__ ((__malloc__)) __nonnull_attribute__ (1); > > O. There is the comment already :) > >> diff --git a/libdw/libdw_alloc.c b/libdw/libdw_alloc.c >> index f2e74d18..86ca7032 100644 >> --- a/libdw/libdw_alloc.c >> +++ b/libdw/libdw_alloc.c >> @@ -35,7 +35,68 @@ >> #include >> #include "libdwP.h" >> #include "system.h" >> +#include "atomics.h" >> +#if USE_VG_ANNOTATIONS == 1 >> +#include >> +#include > > I think if you include helgrind.h you won't get the drd.h > ANNOTATE_HAPPENS_BEFORE/AFTER. So do you also need to include drd.h? > >> +#else >> +#define ANNOTATE_HAPPENS_BEFORE(X) >> +#define ANNOTATE_HAPPENS_AFTER(X) >> +#endif > > Could you explain the usage of the happens_before/after annotations in > this code. I must admit that I don't fully understand why/how it works > in this case. Specifically since realloc might change the address that > mem_tails points to. > >> +#define THREAD_ID_UNSET ((size_t) -1) >> +static thread_local size_t thread_id = THREAD_ID_UNSET; >> +static atomic_size_t next_id = ATOMIC_VAR_INIT(0); > > OK, but maybe use static __thread size_t thread_id as explained above? > >> +struct libdw_memblock * >> +__libdw_alloc_tail (Dwarf *dbg) >> +{ >> + if (thread_id == THREAD_ID_UNSET) >> + thread_id = atomic_fetch_add (&next_id, 1); >> + >> + pthread_rwlock_rdlock (&dbg->mem_rwl); >> + if (thread_id >= dbg->mem_stacks) >> + { >> + pthread_rwlock_unlock (&dbg->mem_rwl); >> + pthread_rwlock_wrlock (&dbg->mem_rwl); >> + >> + /* Another thread may have already reallocated. In theory >> using an >> + atomic would be faster, but given that this only happens >> once per >> + thread per Dwarf, some minor slowdown should be fine. */ >> + if (thread_id >= dbg->mem_stacks) >> + { >> + dbg->mem_tails = realloc (dbg->mem_tails, (thread_id+1) >> + * sizeof (struct >> libdw_memblock *)); >> + if (dbg->mem_tails == NULL) >> + { >> + pthread_rwlock_unlock (&dbg->mem_rwl); >> + dbg->oom_handler(); >> + } >> + for (size_t i = dbg->mem_stacks; i <= thread_id; i++) >> + dbg->mem_tails[i] = NULL; >> + dbg->mem_stacks = thread_id + 1; >> + ANNOTATE_HAPPENS_BEFORE (&dbg->mem_tails); >> + } >> + >> + pthread_rwlock_unlock (&dbg->mem_rwl); >> + pthread_rwlock_rdlock (&dbg->mem_rwl); >> + } > > OK. So next_id only increases, so it might leak a bit without thread > pools. > > Might it make sense to have some __libdw_destroy_tail () function that > could be called from dwarf_end that checks if this was the last Dwarf > in use so we could reset next_id? It would only work in some cases, if > there are multiple Dwarfs in use it probably is useless. I guess it is > too much trickery for an odd corner case? > > O, and I now think you would then also need something for dwarf_begin > to reset any set thread_ids... bleah. So probably way too complicated. > So lets not, unless you think this is actually simple. > >> + /* At this point, we have an entry in the tail array. */ >> + ANNOTATE_HAPPENS_AFTER (&dbg->mem_tails); >> + struct libdw_memblock *result = dbg->mem_tails[thread_id]; >> + if (result == NULL) >> + { >> + result = malloc (dbg->mem_default_size); >> + result->size = dbg->mem_default_size >> + - offsetof (struct libdw_memblock, mem); >> + result->remaining = result->size; >> + result->prev = NULL; >> + dbg->mem_tails[thread_id] = result; >> + } >> + pthread_rwlock_unlock (&dbg->mem_rwl); >> + return result; >> +} > > OK. > >> void * >> __libdw_allocate (Dwarf *dbg, size_t minsize, size_t align) >> @@ -52,10 +113,10 @@ __libdw_allocate (Dwarf *dbg, size_t minsize, >> size_t align) >> newp->size = size - offsetof (struct libdw_memblock, mem); >> newp->remaining = (uintptr_t) newp + size - (result + minsize); >> >> - newp->prev = (struct libdw_memblock*)atomic_exchange_explicit( >> - &dbg->mem_tail, (uintptr_t)newp, memory_order_relaxed); >> - if (pthread_setspecific (dbg->mem_key, newp) != 0) >> - dbg->oom_handler (); >> + pthread_rwlock_rdlock (&dbg->mem_rwl); >> + newp->prev = dbg->mem_tails[thread_id]; >> + dbg->mem_tails[thread_id] = newp; >> + pthread_rwlock_unlock (&dbg->mem_rwl); >> >> return (void *) result; >> } > > OK. Since this is only called after __libdw_alloc_tail you know that > thread_id will be set. > > Thanks, > > Mark