From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 84808 invoked by alias); 7 Nov 2019 17:20:09 -0000 Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: elfutils-devel-owner@sourceware.org Received: (qmail 84794 invoked by uid 89); 7 Nov 2019 17:20:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-19.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy=leak, keys X-Spam-Status: No, score=-19.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: gnu.wildebeest.org Received: from wildebeest.demon.nl (HELO gnu.wildebeest.org) (212.238.236.112) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 07 Nov 2019 17:20:07 +0000 Received: from tarox.wildebeest.org (tarox.wildebeest.org [172.31.17.39]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 85D73300073F; Thu, 7 Nov 2019 18:20:04 +0100 (CET) Received: by tarox.wildebeest.org (Postfix, from userid 1000) id 755464970340; Thu, 7 Nov 2019 18:20:04 +0100 (CET) Message-ID: <93a4d8983b6ec43c09c6c3b3f6ed8d358321bb9d.camel@klomp.org> Subject: Re: [PATCH 2/2] libdw: Rewrite the memory handler to be more robust. From: Mark Wielaard To: elfutils-devel@sourceware.org Cc: Jonathon Anderson Date: Thu, 07 Nov 2019 17:20:00 -0000 In-Reply-To: <20191029211437.3268-2-mark@klomp.org> References: <1572380520.19948.0@rice.edu> <20191029211437.3268-1-mark@klomp.org> <20191029211437.3268-2-mark@klomp.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-Spam-Flag: NO X-IsSubscribed: yes X-SW-Source: 2019-q4/txt/msg00110.txt.bz2 Hi Jonathan, On Tue, 2019-10-29 at 22:14 +0100, Mark Wielaard wrote: > Pthread's thread-local variables are highly limited, which makes > it difficult to use many Dwarfs. This replaces that with a > less efficient (or elegant) but more robust method. Thanks, it looks good (similar to a variant I already reviewed earlier). Some small comments below. I haven't benchmarked this version. Did you do any? > diff --git a/lib/atomics.h b/lib/atomics.h > index ffd12f87..e3773759 100644 > --- a/lib/atomics.h > +++ b/lib/atomics.h > @@ -31,7 +31,9 @@ > #if HAVE_STDATOMIC_H > /* If possible, use the compiler's preferred atomics. */ > # include > +# include > #else > /* Otherwise, try to use the builtins provided by this compiler. */ > # include "stdatomic-fbsd.h" > +# define thread_local __thread > #endif /* HAVE_STDATOMIC_H */ Do we really need this? We already use __thread unconditionally in the rest of the code. The usage of threads.h seems to imply we actually want C11 _Thread_local. Is that what you really want, or can we just use __thread in libdw_alloc.c for thread_id? > diff --git a/libdw/dwarf_begin_elf.c b/libdw/dwarf_begin_elf.c > index 8d137414..eadff13b 100644 > --- a/libdw/dwarf_begin_elf.c > +++ b/libdw/dwarf_begin_elf.c > @@ -418,13 +418,14 @@ dwarf_begin_elf (Elf *elf, Dwarf_Cmd cmd, Elf_Scn *= scngrp) > actual allocation. */ > result->mem_default_size =3D mem_default_size; > result->oom_handler =3D __libdw_oom; > - if (pthread_key_create (&result->mem_key, NULL) !=3D 0) > + if (pthread_rwlock_init(&result->mem_rwl, NULL) !=3D 0) > { > free (result); > - __libdw_seterrno (DWARF_E_NOMEM); /* no memory or max pthread keys= . */ > + __libdw_seterrno (DWARF_E_NOMEM); /* no memory. */ > return NULL; > } > - atomic_init (&result->mem_tail, (uintptr_t)NULL); > + result->mem_stacks =3D 0; > + result->mem_tails =3D NULL; >=20=20 > if (cmd =3D=3D DWARF_C_READ || cmd =3D=3D DWARF_C_RDWR) > { OK. > diff --git a/libdw/dwarf_end.c b/libdw/dwarf_end.c > index a2e94436..3fd2836d 100644 > --- a/libdw/dwarf_end.c > +++ b/libdw/dwarf_end.c > @@ -95,17 +95,19 @@ dwarf_end (Dwarf *dwarf) > tdestroy (dwarf->split_tree, noop_free); >=20=20 > /* Free the internally allocated memory. */ > - struct libdw_memblock *memp; > - memp =3D (struct libdw_memblock *) (atomic_load_explicit > - (&dwarf->mem_tail, > - memory_order_relaxed)); > - while (memp !=3D NULL) > - { > - struct libdw_memblock *prevp =3D memp->prev; > - free (memp); > - memp =3D prevp; > - } > - pthread_key_delete (dwarf->mem_key); > + for (size_t i =3D 0; i < dwarf->mem_stacks; i++) > + { > + struct libdw_memblock *memp =3D dwarf->mem_tails[i]; > + while (memp !=3D NULL) > + { > + struct libdw_memblock *prevp =3D memp->prev; > + free (memp); > + memp =3D prevp; > + } > + } > + if (dwarf->mem_tails !=3D NULL) > + free (dwarf->mem_tails); > + pthread_rwlock_destroy (&dwarf->mem_rwl); >=20=20 > /* Free the pubnames helper structure. */ > free (dwarf->pubnames_sets); OK. Might it be an idea to have some call to reset next_id (see below)? > diff --git a/libdw/libdwP.h b/libdw/libdwP.h > index ad2599eb..3e1ef59b 100644 > --- a/libdw/libdwP.h > +++ b/libdw/libdwP.h > @@ -149,17 +149,6 @@ enum >=20=20 > #include "dwarf_sig8_hash.h" >=20=20 > -/* Structure for internal memory handling. This is basically a simplifi= ed > - reimplementation of obstacks. Unfortunately the standard obstack > - implementation is not usable in libraries. */ > -struct libdw_memblock > -{ > - size_t size; > - size_t remaining; > - struct libdw_memblock *prev; > - char mem[0]; > -}; > - > /* This is the structure representing the debugging state. */ > struct Dwarf > { > @@ -231,11 +220,22 @@ struct Dwarf > /* Similar for addrx/constx, which will come from .debug_addr section.= */ > struct Dwarf_CU *fake_addr_cu; >=20=20 > - /* Internal memory handling. Each thread allocates separately and only > - allocates from its own blocks, while all the blocks are pushed atom= ically > - onto a unified stack for easy deallocation. */ > - pthread_key_t mem_key; > - atomic_uintptr_t mem_tail; > + /* Supporting lock for internal memory handling. Ensures threads that= have > + an entry in the mem_tails array are not disturbed by new threads do= ing > + allocations for this Dwarf. */ > + pthread_rwlock_t mem_rwl; > + > + /* Internal memory handling. This is basically a simplified thread-lo= cal > + reimplementation of obstacks. Unfortunately the standard obstack > + implementation is not usable in libraries. */ > + size_t mem_stacks; > + struct libdw_memblock > + { > + size_t size; > + size_t remaining; > + struct libdw_memblock *prev; > + char mem[0]; > + } **mem_tails; >=20=20 > /* Default size of allocated memory blocks. */ > size_t mem_default_size; OK. > @@ -578,34 +578,31 @@ libdw_valid_user_form (int form) > extern void __libdw_seterrno (int value) internal_function; >=20=20 >=20=20 > -/* Memory handling, the easy parts. This macro does not do nor need to = do any > - locking for proper concurrent operation. */ > +/* Memory handling, the easy parts. */ > #define libdw_alloc(dbg, type, tsize, cnt) \ > - ({ struct libdw_memblock *_tail =3D pthread_getspecific (dbg->mem_key)= ; \ > - size_t _req =3D (tsize) * (cnt); \ > - type *_result; \ > - if (unlikely (_tail =3D=3D NULL)) \ > - _result =3D (type *) __libdw_allocate (dbg, _req, __alignof (type= )); \ > + ({ struct libdw_memblock *_tail =3D __libdw_alloc_tail(dbg); \ > + size_t _required =3D (tsize) * (cnt); \ > + type *_result =3D (type *) (_tail->mem + (_tail->size - _tail->rema= ining));\ > + size_t _padding =3D ((__alignof (type) \ > + - ((uintptr_t) _result & (__alignof (type) - 1))) \ > + & (__alignof (type) - 1)); \ > + if (unlikely (_tail->remaining < _required + _padding)) \ > + _result =3D (type *) __libdw_allocate (dbg, _required, __alignof = (type));\ > else \ > { \ > - _result =3D (type *) (_tail->mem + (_tail->size - _tail->remaining)); = \ > - size_t _padding =3D ((__alignof (type) \ > - - ((uintptr_t) _result & (__alignof (type) - 1))) \ > - & (__alignof (type) - 1)); \ > - if (unlikely (_tail->remaining < _req + _padding)) \ > - _result =3D (type *) __libdw_allocate (dbg, _req, __alignof (type));= \ > - else \ > - { \ > - _req +=3D _padding; \ > - _result =3D (type *) ((char *) _result + _padding); \ > - _tail->remaining -=3D _req; \ > - } \ > + _required +=3D _padding; \ > + _result =3D (type *) ((char *) _result + _padding); \ > + _tail->remaining -=3D _required; \ > } \ > _result; }) OK. Maybe add a comment that __libdw_alloc_tail makes sure that we get a thread local libdw_memblock to work with. > #define libdw_typed_alloc(dbg, type) \ > libdw_alloc (dbg, type, sizeof (type), 1) >=20=20 > +/* Callback to choose a thread-local memory allocation stack. */ > +extern struct libdw_memblock *__libdw_alloc_tail (Dwarf* dbg) > + __nonnull_attribute__ (1); > + > /* Callback to allocate more. */ > extern void *__libdw_allocate (Dwarf *dbg, size_t minsize, size_t align) > __attribute__ ((__malloc__)) __nonnull_attribute__ (1); O. There is the comment already :) > diff --git a/libdw/libdw_alloc.c b/libdw/libdw_alloc.c > index f2e74d18..86ca7032 100644 > --- a/libdw/libdw_alloc.c > +++ b/libdw/libdw_alloc.c > @@ -35,7 +35,68 @@ > #include > #include "libdwP.h" > #include "system.h" > +#include "atomics.h" > +#if USE_VG_ANNOTATIONS =3D=3D 1 > +#include > +#include I think if you include helgrind.h you won't get the drd.h ANNOTATE_HAPPENS_BEFORE/AFTER. So do you also need to include drd.h? > +#else > +#define ANNOTATE_HAPPENS_BEFORE(X) > +#define ANNOTATE_HAPPENS_AFTER(X) > +#endif Could you explain the usage of the happens_before/after annotations in this code. I must admit that I don't fully understand why/how it works in this case. Specifically since realloc might change the address that mem_tails points to. > +#define THREAD_ID_UNSET ((size_t) -1) > +static thread_local size_t thread_id =3D THREAD_ID_UNSET; > +static atomic_size_t next_id =3D ATOMIC_VAR_INIT(0); OK, but maybe use static __thread size_t thread_id as explained above? > +struct libdw_memblock * > +__libdw_alloc_tail (Dwarf *dbg) > +{ > + if (thread_id =3D=3D THREAD_ID_UNSET) > + thread_id =3D atomic_fetch_add (&next_id, 1); > + > + pthread_rwlock_rdlock (&dbg->mem_rwl); > + if (thread_id >=3D dbg->mem_stacks) > + { > + pthread_rwlock_unlock (&dbg->mem_rwl); > + pthread_rwlock_wrlock (&dbg->mem_rwl); > + > + /* Another thread may have already reallocated. In theory using an > + atomic would be faster, but given that this only happens once p= er > + thread per Dwarf, some minor slowdown should be fine. */ > + if (thread_id >=3D dbg->mem_stacks) > + { > + dbg->mem_tails =3D realloc (dbg->mem_tails, (thread_id+1) > + * sizeof (struct libdw_memblock *)); > + if (dbg->mem_tails =3D=3D NULL) > + { > + pthread_rwlock_unlock (&dbg->mem_rwl); > + dbg->oom_handler(); > + } > + for (size_t i =3D dbg->mem_stacks; i <=3D thread_id; i++) > + dbg->mem_tails[i] =3D NULL; > + dbg->mem_stacks =3D thread_id + 1; > + ANNOTATE_HAPPENS_BEFORE (&dbg->mem_tails); > + } > + > + pthread_rwlock_unlock (&dbg->mem_rwl); > + pthread_rwlock_rdlock (&dbg->mem_rwl); > + } OK. So next_id only increases, so it might leak a bit without thread pools. Might it make sense to have some __libdw_destroy_tail () function that could be called from dwarf_end that checks if this was the last Dwarf in use so we could reset next_id? It would only work in some cases, if there are multiple Dwarfs in use it probably is useless. I guess it is too much trickery for an odd corner case? O, and I now think you would then also need something for dwarf_begin to reset any set thread_ids... bleah. So probably way too complicated. So lets not, unless you think this is actually simple. > + /* At this point, we have an entry in the tail array. */ > + ANNOTATE_HAPPENS_AFTER (&dbg->mem_tails); > + struct libdw_memblock *result =3D dbg->mem_tails[thread_id]; > + if (result =3D=3D NULL) > + { > + result =3D malloc (dbg->mem_default_size); > + result->size =3D dbg->mem_default_size > + - offsetof (struct libdw_memblock, mem); > + result->remaining =3D result->size; > + result->prev =3D NULL; > + dbg->mem_tails[thread_id] =3D result; > + } > + pthread_rwlock_unlock (&dbg->mem_rwl); > + return result; > +} OK. > void * > __libdw_allocate (Dwarf *dbg, size_t minsize, size_t align) > @@ -52,10 +113,10 @@ __libdw_allocate (Dwarf *dbg, size_t minsize, size_t= align) > newp->size =3D size - offsetof (struct libdw_memblock, mem); > newp->remaining =3D (uintptr_t) newp + size - (result + minsize); >=20=20 > - newp->prev =3D (struct libdw_memblock*)atomic_exchange_explicit( > - &dbg->mem_tail, (uintptr_t)newp, memory_order_relaxed); > - if (pthread_setspecific (dbg->mem_key, newp) !=3D 0) > - dbg->oom_handler (); > + pthread_rwlock_rdlock (&dbg->mem_rwl); > + newp->prev =3D dbg->mem_tails[thread_id]; > + dbg->mem_tails[thread_id] =3D newp; > + pthread_rwlock_unlock (&dbg->mem_rwl); >=20=20 > return (void *) result; > } OK. Since this is only called after __libdw_alloc_tail you know that thread_id will be set. Thanks, Mark