From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-hacker-return-7468-listarch-libc-hacker=sources.redhat.com@sources.redhat.com>
Received: (qmail 10449 invoked by alias); 31 Mar 2004 01:46:58 -0000
Mailing-List: contact libc-hacker-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:libc-hacker-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/libc-hacker/>
List-Post: <mailto:libc-hacker@sources.redhat.com>
List-Help: <mailto:libc-hacker-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: libc-hacker-owner@sources.redhat.com
Received: (qmail 10431 invoked from network); 31 Mar 2004 01:46:58 -0000
Received: from unknown (HELO palrel11.hp.com) (156.153.255.246)
  by sources.redhat.com with SMTP; 31 Mar 2004 01:46:58 -0000
Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33])
	by palrel11.hp.com (Postfix) with ESMTP id 6C54C1C01AAE
	for <libc-hacker@sources.redhat.com>; Tue, 30 Mar 2004 17:46:57 -0800 (PST)
Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123])
	by hplms2.hpl.hp.com (8.12.10/8.12.10/HPL-PA Hub) with ESMTP id i2V1krcJ019754;
	Tue, 30 Mar 2004 17:46:53 -0800 (PST)
Received: from napali.hpl.hp.com (napali [127.0.0.1])
	by napali.hpl.hp.com (8.12.11/8.12.11/Debian-1) with ESMTP id i2V1kq8N031254;
	Tue, 30 Mar 2004 17:46:52 -0800
Received: (from davidm@localhost)
	by napali.hpl.hp.com (8.12.11/8.12.11/Debian-1) id i2V1kq6m031250;
	Tue, 30 Mar 2004 17:46:52 -0800
From: David Mosberger <davidm@napali.hpl.hp.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <16490.8972.453326.222897@napali.hpl.hp.com>
Date: Wed, 31 Mar 2004 08:53:00 -0000
To: libc-hacker@sources.redhat.com
Cc: davidm@napali.hpl.hp.com
Subject: second thoughts on using dl_iterate_phdr() for cache-validation
Reply-To: davidm@hpl.hp.com
X-URL: http://www.hpl.hp.com/personal/David_Mosberger/
X-SW-Source: 2004-03/txt/msg00148.txt.bz2

Doing some more careful performance analysis with libunwind, I'm
finding that the dl_iterate_phdr() call needed to verify that the
phdr-list didn't change is rather expensive.  Specifically, the time
needed to initialize an "unwind cursor" (via unw_init_local()) is as
follows:

  without dl_iterate_phdr() callback (unsafe):	140 ns
  with dl_iterate_phdr(), without -lpthread:	200 ns
  with dl_iterate_phdr(), with -lpthread:	300 ns

This is rather more than I expected and a slow-down of more than a
factor of 2 for multi-threaded apps is bit more than I'm willing to
bear since it could really affect the usability of libunwind for
things such as allocation tracking or stack-trace-based sampling.

The profile for the "without -lpthread" case looks like this:

% time   self  cumul  calls self/call tot/call name
 60.44  13.05  13.05   101M      129n     204n _ULia64_init_local
 17.19   3.71  16.76  99.6M     37.3n    66.7n dl_iterate_phdr
  4.65   1.00  17.76   101M     9.97n    9.97n rtld_lock_default_lock_recursive
  4.64   1.00  18.77  99.0M     10.1n    10.1n rtld_lock_default_unlock_recursive

The profile for the "with -lpthread" case looks like this (this was
measured on a different machine, so the total time of 223 ns is not
comparable to the 300 ns mentioned above; the relative times are fine,
though):

% time   self  cumul  calls self/call tot/call name
 47.93  11.25  11.25  99.6M      113n     223n _ULia64_init_local
 18.35   4.31  15.56   100M     43.0n    43.0n pthread_mutex_lock
 11.81   2.77  18.33   100M     27.7n    27.7n __pthread_mutex_unlock_usercnt
 11.65   2.73  21.06   100M     27.3n     103n __GI___dl_iterate_phdr

For brevity, I didn't include the call-graphs, but they are pretty
easy: all calls to dl_iterate_phdr() are indirectly due to the
cache-validation done by _ULia64_init_local() and almost all
lock-related calls are due to dl_iterate_phdr().

I suppose I could add a libunwind-hack to disable cache-validation,
but that seems like a step backward since it would make caching unsafe
again.

In case it matters, the first profile was obtained with libc v2.3.2
and the second profile was obtained with the CVS libc (as of a few
days ago).

Can this be improved?

	--david