From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <elfutils-devel-return-7710-listarch-elfutils-devel=sourceware.org@sourceware.org>
Received: (qmail 114372 invoked by alias); 25 Aug 2019 10:05:04 -0000
Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <elfutils-devel.sourceware.org>
List-Post: <mailto:elfutils-devel@sourceware.org>
List-Help: <mailto:elfutils-devel-help@sourceware.org>
List-Subscribe: <mailto:elfutils-devel-subscribe@sourceware.org>
Sender: elfutils-devel-owner@sourceware.org
Received: (qmail 114358 invoked by uid 89); 25 Aug 2019 10:05:03 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=
X-Spam-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org
X-Spam-Level:
X-HELO: gnu.wildebeest.org
Received: from wildebeest.demon.nl (HELO gnu.wildebeest.org) (212.238.236.112) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 25 Aug 2019 10:05:01 +0000
Received: from tarox.wildebeest.org (tarox.wildebeest.org [172.31.17.39])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by gnu.wildebeest.org (Postfix) with ESMTPSA id 5D85230012F6;	Sun, 25 Aug 2019 12:04:59 +0200 (CEST)
Received: by tarox.wildebeest.org (Postfix, from userid 1000)	id 0955C4000BA0; Sun, 25 Aug 2019 12:04:59 +0200 (CEST)
Message-ID: <5ba06557703ee363e19488c994cbddd92ade25be.camel@klomp.org>
Subject: Re: [PATCH] libdw: add thread-safety to dwarf_getabbrev()
From: Mark Wielaard <mark@klomp.org>
To: Jonathon Anderson <jma14@rice.edu>
Cc: elfutils-devel@sourceware.org, Srdan Milakovic <sm108@rice.edu>
Date: Sun, 25 Aug 2019 10:05:00 -0000
In-Reply-To: <1566695452.979.1@smtp.mail.rice.edu>
References: <1565983469.1826.0@smtp.mail.rice.edu>		<20190824232438.GA2622@wildebeest.org>	 <1566695452.979.1@smtp.mail.rice.edu>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Evolution 3.28.5 (3.28.5-5.el7)
Mime-Version: 1.0
X-Spam-Flag: NO
X-IsSubscribed: yes
X-SW-Source: 2019-q3/txt/msg00133.txt.bz2

Hi Jonathon,

On Sat, 2019-08-24 at 20:10 -0500, Jonathon Anderson wrote:
> On Sat, Aug 24, 2019 at 6:24 PM, Mark Wielaard <mark@klomp.org> wrote:
> > But what if realloc moves the block?
> > Then all dbg->mem_tails[thread_id] pointers become invalid.
> > And this function drops the lock before returning a libdw_memblock*.
>=20
> Not quite, dbg->mem_tails is an array of pointers (note the ** in its=20
> definition, and the use of the plural "tails"). So while the=20
> dbg->mem_tails pointer becomes invalid, the dbg->mem_tails[thread_id]=20
> pointers don't.
>=20
> It would be a different story if dbg->mem_tails was an array of=20
> libdw_memblocks, but its not.

Aha. Yes, they are pointers to the mem_blocks, not the mem_blocks
themselves. The pointer values are moved, but never changed. So this is
indeed fine. I was confused.

>  That would increase the "memory leak"=20
> issue mentioned previously (to ~4K per dead thread) and require more=20
> work on the part of the reallocing thread to initialize the new entries=20
> (which at the moment should reduce to a memset, assuming the compiler=20
> is smart enough and NULL =3D=3D 0).
>=20
> >=20
> > So I think the lock needs to extend beyond this function somehow.  Or
> > we need to prevent another thread reallocing while we are dealing with
> > the assigned memblock.
>=20
> No need, in fact we want to drop the lock as soon as possible to let=20
> new threads in for realloc's. Under the assumption that we don't need=20
> to allocate new blocks (extremely) often, the extra cost to relock when=20
> we do should be relatively small. Also see __libdw_allocate.

Yes, now that I have a better (correct) mental picture of the data
structure, this makes total sense.

> As mentioned in other mails, this management scheme isn't (always)=20
> optimally memory efficient, but its speed is good under parallel=20
> stress. Far better than wrapping the whole thing with a single=20
> pthread_mutex_t.

I wouldn't tweak it too much if you know it is working correctly now.
We do have to make sure it doesn't slow down the single-threaded use
case (since most programs still are single threaded). There is the new
locking, which slows things down. But the memory use should be about
the same since you don't duplicate the mem_blocks till there is access
from multiple threads.

If there isn't much parallel stress or allocation in practice. That is,
even in multi-threaded programs, there is still just one thread that
does most/all of the allocations. Then you could maybe optimize a bit
by having one "owner" thread for a Dwarf. That would be the first that
hits __libdw_alloc_tail. For that "owner thread" you then setup an
owner thread_id field, and have one special mem_block allocated. You
only expand the mem_blocks once another thread does an allocation. Then
the memory use in multi-threaded programs, where only one thread
handles a Dwarf object (or happens to allocate) would be the same as
for single-threaded programs.

But this depends on the usage pattern. Which might vary very between
programs. So don't try it, till you know it actually helps for a real
world program. And the extra memory use, might not really matter that
much in practice anyway.

Cheers,

Mark