public inbox for libc-hacker@sourceware.org
 help / color / mirror / Atom feed
* second thoughts on using dl_iterate_phdr() for cache-validation
@ 2004-03-31  8:53 David Mosberger
  2004-03-31 16:04 ` Roland McGrath
  0 siblings, 1 reply; 22+ messages in thread
From: David Mosberger @ 2004-03-31  8:53 UTC (permalink / raw)
  To: libc-hacker; +Cc: davidm

Doing some more careful performance analysis with libunwind, I'm
finding that the dl_iterate_phdr() call needed to verify that the
phdr-list didn't change is rather expensive.  Specifically, the time
needed to initialize an "unwind cursor" (via unw_init_local()) is as
follows:

  without dl_iterate_phdr() callback (unsafe):	140 ns
  with dl_iterate_phdr(), without -lpthread:	200 ns
  with dl_iterate_phdr(), with -lpthread:	300 ns

This is rather more than I expected and a slow-down of more than a
factor of 2 for multi-threaded apps is bit more than I'm willing to
bear since it could really affect the usability of libunwind for
things such as allocation tracking or stack-trace-based sampling.

The profile for the "without -lpthread" case looks like this:

% time   self  cumul  calls self/call tot/call name
 60.44  13.05  13.05   101M      129n     204n _ULia64_init_local
 17.19   3.71  16.76  99.6M     37.3n    66.7n dl_iterate_phdr
  4.65   1.00  17.76   101M     9.97n    9.97n rtld_lock_default_lock_recursive
  4.64   1.00  18.77  99.0M     10.1n    10.1n rtld_lock_default_unlock_recursive

The profile for the "with -lpthread" case looks like this (this was
measured on a different machine, so the total time of 223 ns is not
comparable to the 300 ns mentioned above; the relative times are fine,
though):

% time   self  cumul  calls self/call tot/call name
 47.93  11.25  11.25  99.6M      113n     223n _ULia64_init_local
 18.35   4.31  15.56   100M     43.0n    43.0n pthread_mutex_lock
 11.81   2.77  18.33   100M     27.7n    27.7n __pthread_mutex_unlock_usercnt
 11.65   2.73  21.06   100M     27.3n     103n __GI___dl_iterate_phdr

For brevity, I didn't include the call-graphs, but they are pretty
easy: all calls to dl_iterate_phdr() are indirectly due to the
cache-validation done by _ULia64_init_local() and almost all
lock-related calls are due to dl_iterate_phdr().

I suppose I could add a libunwind-hack to disable cache-validation,
but that seems like a step backward since it would make caching unsafe
again.

In case it matters, the first profile was obtained with libc v2.3.2
and the second profile was obtained with the CVS libc (as of a few
days ago).

Can this be improved?

	--david

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: second thoughts on using dl_iterate_phdr() for cache-validation
@ 2004-04-15 18:28 David Mosberger
  0 siblings, 0 replies; 22+ messages in thread
From: David Mosberger @ 2004-04-15 18:28 UTC (permalink / raw)
  To: roland, libc-hacker

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 93 bytes --]

Roland,

Any comments on the proposal?  I'd really like to wrap this up.

Thanks,

	--david


[-- Attachment #2: forwarded message --]
[-- Type: message/rfc822, Size: 10305 bytes --]

From: David Mosberger <davidm@linux.hpl.hp.com>
To: Roland McGrath <roland@redhat.com>
Cc: davidm@hpl.hp.com, libc-hacker@sources.redhat.com
Subject: Re: second thoughts on using dl_iterate_phdr() for cache-validation
Date: Wed, 31 Mar 2004 18:32:57 -0800
Message-ID: <16491.32601.193260.288217@napali.hpl.hp.com>

>>>>> On Wed, 31 Mar 2004 00:04:25 -0800, Roland McGrath <roland@redhat.com> said:

  Roland> The only thing I see is to give you a way to see
  Roland> _dl_load_adds without calling __dl_iterate_phdr and taking
  Roland> locks, which is what you asked for in the first place.  We
  Roland> really can't give you an exported variable symbol, so it
  Roland> would have to be a new function interface.

How about something along the lines of this:

-----------
/* The value returned by this function increments by at least 1 when
   objects are added to the list visited by dl_iterate_phdr().  This
   function is not serialized with respect to dl_iterate_phdr().  If
   serialization is needed, this function should be called from within
   a dl_itereate_phdr() callback.  */
extern unsigned long long dl_phdr_additions_counter (void);

/* The vaue returned by this function increments by at least 1 when
   objects are removed from the list visited by dl_iterate_phdr().
   This function is not serialized with respect to dl_iterate_phdr().
   If serialization is needed, this function should be called from
   within a dl_itereate_phdr() callback.  */
extern unsigned long long dl_phdr_removals_counter (void);
-----------

I considered using a single function with an argument which specifies
the counter to read, but then I thought that many programs may only
read one or the other counter.  Plus if there is two counters, someone
might want to add a third counter in the future and if that needs to
be supported, there would have to be a way for detecting whether a
libc supports the new counter, etc., so I thought having two separate
routines might not be such a bad idea after all.

A preliminary but complete patch is below and it dropped the cost of
unw_init_local() from 223 to 119ns (on the faster machine I profiled
yesterday).  The code of unw_init_local() itself accounts for 113ns
and the overhead of 6ns is about in line with a shared library call.
This performance-level seems very acceptable to me.  Note that the way
things work in libunwind, I don't have to worry about concurrent
dl_close() operations and hence it is safe to read the counter without
locking.

Some questions/comments:

 - I always get confused about the versioning stuff in glibc.
   I added the new functions to elf/Versions, but am not sure
   this is really correct.

 - I wasn't sure what the right place for the new functions is.
   For now, I put them into dl-iteratephdr.c which is probably
   a bad choice.  Furthermore, I'm not sure about name-space
   pollution issues.  Advice on the naming etc. would be welcome.

 - For now, I didn't remove the dl_adds/dl_subs members from the
   dl_iterate_phdr() callback.  With the new interface, they won't be
   needed anymore and I suspect nobody other than libunwind would be
   affected if we removed them (and even old libunwind versions would
   continue work properly).  If you agree, I'll cook up a second patch
   to remove this stuff once the new interface is in.

Thanks,

	--david

Index: elf/Versions
===================================================================
RCS file: /cvs/glibc/libc/elf/Versions,v
retrieving revision 1.51
diff -u -r1.51 Versions
--- elf/Versions	6 Mar 2004 08:04:12 -0000	1.51
+++ elf/Versions	1 Apr 2004 02:18:54 -0000
@@ -11,6 +11,10 @@
   GLIBC_2.2.4 {
     dl_iterate_phdr;
   }
+  GLIBC_2.3.2 {
+    dl_phdr_additions_counter;
+    dl_phdr_removals_counter;
+  }
 %ifdef EXPORT_UNWIND_FIND_FDE
   GCC_3.0 {
     __register_frame_info_bases; __deregister_frame_info_bases;
Index: elf/dl-close.c
===================================================================
RCS file: /cvs/glibc/libc/elf/dl-close.c,v
retrieving revision 1.103
diff -u -r1.103 dl-close.c
--- elf/dl-close.c	7 Mar 2004 08:38:43 -0000	1.103
+++ elf/dl-close.c	1 Apr 2004 02:18:55 -0000
@@ -319,6 +319,7 @@
   /* Notify the debugger we are about to remove some loaded objects.  */
   _r_debug.r_state = RT_DELETE;
   GLRO(dl_debug_state) ();
+  ++GL(dl_load_subs);
 
 #ifdef USE_TLS
   size_t tls_free_start;
Index: elf/dl-iteratephdr.c
===================================================================
RCS file: /cvs/glibc/libc/elf/dl-iteratephdr.c,v
retrieving revision 1.11
diff -u -r1.11 dl-iteratephdr.c
--- elf/dl-iteratephdr.c	27 Jan 2004 21:29:22 -0000	1.11
+++ elf/dl-iteratephdr.c	1 Apr 2004 02:18:55 -0000
@@ -23,6 +23,19 @@
 #include <stddef.h>
 #include <bits/libc-lock.h>
 
+
+unsigned long long
+dl_phdr_additions_counter (void)
+{
+  return GL(dl_load_adds);
+}
+
+unsigned long long
+dl_phdr_removals_counter (void)
+{
+  return GL(dl_load_subs);
+}
+
 static void
 cancel_handler (void *arg __attribute__((unused)))
 {
Index: elf/dl-support.c
===================================================================
RCS file: /cvs/glibc/libc/elf/dl-support.c,v
retrieving revision 1.83
diff -u -r1.83 dl-support.c
--- elf/dl-support.c	6 Mar 2004 09:06:15 -0000	1.83
+++ elf/dl-support.c	1 Apr 2004 02:18:55 -0000
@@ -75,6 +75,9 @@
 /* Incremented whenever something may have been added to dl_loaded. */
 unsigned long long _dl_load_adds;
 
+/* Incremented whenever something may have been removed from dl_loaded. */
+unsigned long long _dl_load_subs;
+
 /* Fake scope.  In dynamically linked binaries this is the scope of the
    main application but here we don't have something like this.  So
    create a fake scope containing nothing.  */
Index: elf/tst-dlmodcount.c
===================================================================
RCS file: /cvs/glibc/libc/elf/tst-dlmodcount.c,v
retrieving revision 1.3
diff -u -r1.3 tst-dlmodcount.c
--- elf/tst-dlmodcount.c	26 Mar 2004 09:48:53 -0000	1.3
+++ elf/tst-dlmodcount.c	1 Apr 2004 02:18:55 -0000
@@ -71,6 +71,39 @@
   return -1;
 }
 
+static void
+check (int cmd)
+{
+  static int last_adds = 0, last_subs = 0;
+
+  printf ("  additions = %Lu removals = %Lu\n",
+	  dl_phdr_additions_counter (), dl_phdr_removals_counter ());
+
+  switch (cmd)
+    {
+    case SET:
+      break;
+
+    case ADD:
+      if (leq (dl_phdr_additions_counter (), last_adds))
+	{
+	  fprintf (stderr, "dlpi_adds failed to get incremented!\n");
+	  exit (3);
+	}
+      break;
+
+    case REMOVE:
+      if (leq (dl_phdr_removals_counter (), last_subs))
+	{
+	  fprintf (stderr, "dlpi_subs failed to get incremented!\n");
+	  exit (4);
+	}
+      break;
+    }
+  last_adds = dl_phdr_additions_counter ();
+  last_subs = dl_phdr_removals_counter ();
+}
+
 static void *
 load (const char *path)
 {
@@ -81,6 +114,7 @@
   if (!handle)
     exit (1);
   dl_iterate_phdr (callback, (void *)(intptr_t) ADD);
+  check (ADD);
   return handle;
 }
 
@@ -91,6 +125,7 @@
   if (dlclose (handle) < 0)
     exit (2);
   dl_iterate_phdr (callback, (void *)(intptr_t) REMOVE);
+  check (REMOVE);
 }
 
 int
@@ -99,6 +134,7 @@
   void *handle1, *handle2;
 
   dl_iterate_phdr (callback, (void *)(intptr_t) SET);
+  check (SET);
   handle1 = load ("firstobj.so");
   handle2 = load ("globalmod1.so");
   unload ("firstobj.so", handle1);
Index: include/link.h
===================================================================
RCS file: /cvs/glibc/libc/include/link.h,v
retrieving revision 1.31
diff -u -r1.31 link.h
--- include/link.h	27 Mar 2004 03:22:29 -0000	1.31
+++ include/link.h	1 Apr 2004 02:18:55 -0000
@@ -307,4 +307,18 @@
 					       size_t size, void *data),
 			      void *data);
 
+/* The value returned by this function increments by at least 1 when
+   objects are added to the list visited by dl_iterate_phdr().  This
+   function is not serialized with respect to dl_iterate_phdr().  If
+   serialization is needed, this function should be called from within
+   a dl_itereate_phdr() callback.  */
+extern unsigned long long dl_phdr_additions_counter (void);
+
+/* The vaue returned by this function increments by at least 1 when
+   objects are removed from the list visited by dl_iterate_phdr().
+   This function is not serialized with respect to dl_iterate_phdr().
+   If serialization is needed, this function should be called from
+   within a dl_itereate_phdr() callback.  */
+extern unsigned long long dl_phdr_removals_counter (void);
+
 #endif /* link.h */
Index: sysdeps/generic/ldsodefs.h
===================================================================
RCS file: /cvs/glibc/libc/sysdeps/generic/ldsodefs.h,v
retrieving revision 1.100
diff -u -r1.100 ldsodefs.h
--- sysdeps/generic/ldsodefs.h	27 Mar 2004 03:23:51 -0000	1.100
+++ sysdeps/generic/ldsodefs.h	1 Apr 2004 02:18:59 -0000
@@ -241,6 +241,9 @@
   /* Incremented whenever something may have been added to dl_loaded.  */
   EXTERN unsigned long long _dl_load_adds;
 
+  /* Incremented whenever something may have been removed from dl_loaded.  */
+  EXTERN unsigned long long _dl_load_subs;
+
 #ifndef MAP_ANON
   /* File descriptor referring to the zero-fill device.  */
   EXTERN int _dl_zerofd;

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-11-12 19:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-31  8:53 second thoughts on using dl_iterate_phdr() for cache-validation David Mosberger
2004-03-31 16:04 ` Roland McGrath
2004-04-01  2:33   ` David Mosberger
2004-11-04 23:41     ` David Mosberger
2004-11-04 23:53       ` Roland McGrath
2004-11-05  0:40         ` David Mosberger
2004-11-05  1:10           ` Roland McGrath
2004-11-05  1:28             ` David Mosberger
2004-11-05  1:36               ` Roland McGrath
2004-11-05  2:09                 ` David Mosberger
2004-11-05  3:27                   ` Roland McGrath
2004-11-06  0:21                     ` David Mosberger
2004-11-06  2:32                       ` Roland McGrath
2004-11-09  0:15                         ` David Mosberger
2004-11-09  4:40                           ` Roland McGrath
2004-11-10  5:40                             ` David Mosberger
2004-11-10 20:36                               ` David Mosberger
2004-11-10 20:45                               ` Roland McGrath
2004-11-11  6:17                                 ` David Mosberger
2004-11-11 22:24                                   ` Roland McGrath
2004-11-12 19:32                                     ` David Mosberger
2004-04-15 18:28 David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).