From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25147 invoked by alias); 17 Oct 2006 14:00:55 -0000 Received: (qmail 25120 invoked by uid 22791); 17 Oct 2006 14:00:53 -0000 X-Spam-Check-By: sourceware.org Received: from sunsite.ms.mff.cuni.cz (HELO sunsite.mff.cuni.cz) (195.113.15.26) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 17 Oct 2006 14:00:43 +0000 Received: from sunsite.mff.cuni.cz (sunsite.mff.cuni.cz [127.0.0.1]) by sunsite.mff.cuni.cz (8.13.1/8.13.1) with ESMTP id k9HE0bKG022288; Tue, 17 Oct 2006 16:00:37 +0200 Received: (from jj@localhost) by sunsite.mff.cuni.cz (8.13.1/8.13.1/Submit) id k9HE0agU022284; Tue, 17 Oct 2006 16:00:36 +0200 Date: Tue, 17 Oct 2006 14:00:00 -0000 From: Jakub Jelinek To: Ulrich Drepper Cc: Glibc hackers Subject: [PATCH] Don't use catomic_* in __rtld_mrlock* Message-ID: <20061017140036.GE5868@sunsite.mff.cuni.cz> Reply-To: Jakub Jelinek Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Mailing-List: contact libc-hacker-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-hacker-owner@sourceware.org X-SW-Source: 2006-10/txt/msg00010.txt.bz2 Hi! With the latest switch to catomic_* in __rtld_mrlock* and related code, I believe we are unnecessarily inefficient. E.g. in __rtld_mrlock_lock (l->l_scoperec_lock); scoperec = l->l_scoperec; catomic_increment (&scoperec->nusers); __rtld_mrlock_unlock (l->l_scoperec_lock); we at least 3 times read self->header.self and branch depending if it is zero or not. But we know that a valid program can't call pthread_create from within dl_lookup_symbol_x or signal handlers, therefore just one SINGLE_THREAD_P check should be enough. The following patch implements it, in the most performance critical places (dl-runtime.c and dl-sym.c) it means fewer executed instructions both in the single thread and multiple thread cases (in dl-sym.c even far faster for single thread case) and shorter code. 2006-10-17 Jakub Jelinek * elf/dl-runtime.c: Include sysdep-cancel.h. (_dl_fixup, _dl_profile_fixup): Use __rtld_mrlock_* and scoperec->nusers only if !SINGLE_THREAD_P. Use atomic_* instead of catomic_* macros. * elf/dl-sym.c: Include sysdep-cancel.h. (do_sym): Use __rtld_mrlock_* and scoperec->nusers only if !SINGLE_THREAD_P. Use atomic_* instead of catomic_* macros. * elf/dl-close.c: Include sysdep-cancel.h. (_dl_close): Use __rtld_mrlock_* and scoperec->nusers only if !SINGLE_THREAD_P. Use atomic_* instead of catomic_* macros. * elf/dl-open.c: Include sysdep-cancel.h. (dl_open_worker): Use __rtld_mrlock_* and scoperec->nusers only if !SINGLE_THREAD_P. Use atomic_* instead of catomic_* macros. nptl/ * sysdeps/unix/sysv/linux/rtld-lowlevel.h (__rtld_mrlock_lock, __rtld_mrlock_unlock, __rtld_mrlock_change, __rtld_mrlock_done): Use atomic_* instead of catomic_* macros. --- libc/elf/dl-runtime.c.jj 2006-10-17 00:12:31.000000000 +0200 +++ libc/elf/dl-runtime.c 2006-10-17 14:55:35.000000000 +0200 @@ -24,6 +24,7 @@ #include #include #include +#include #include "dynamic-link.h" #if (!defined ELF_MACHINE_NO_RELA && !defined ELF_MACHINE_PLT_REL) \ @@ -93,11 +94,11 @@ _dl_fixup ( } struct r_scoperec *scoperec = l->l_scoperec; - if (l->l_type == lt_loaded) + if (l->l_type == lt_loaded && !SINGLE_THREAD_P) { __rtld_mrlock_lock (l->l_scoperec_lock); scoperec = l->l_scoperec; - catomic_increment (&scoperec->nusers); + atomic_increment (&scoperec->nusers); __rtld_mrlock_unlock (l->l_scoperec_lock); } @@ -106,8 +107,8 @@ _dl_fixup ( ELF_RTYPE_CLASS_PLT, DL_LOOKUP_ADD_DEPENDENCY, NULL); - if (l->l_type == lt_loaded - && catomic_decrement_val (&scoperec->nusers) == 0 + if (l->l_type == lt_loaded && !SINGLE_THREAD_P + && atomic_decrement_val (&scoperec->nusers) == 0 && __builtin_expect (scoperec->remove_after_use, 0)) { if (scoperec->notify) @@ -195,11 +196,11 @@ _dl_profile_fixup ( } struct r_scoperec *scoperec = l->l_scoperec; - if (l->l_type == lt_loaded) + if (l->l_type == lt_loaded && !SINGLE_THREAD_P) { __rtld_mrlock_lock (l->l_scoperec_lock); scoperec = l->l_scoperec; - catomic_increment (&scoperec->nusers); + atomic_increment (&scoperec->nusers); __rtld_mrlock_unlock (l->l_scoperec_lock); } @@ -208,8 +209,8 @@ _dl_profile_fixup ( ELF_RTYPE_CLASS_PLT, DL_LOOKUP_ADD_DEPENDENCY, NULL); - if (l->l_type == lt_loaded - && catomic_decrement_val (&scoperec->nusers) == 0 + if (l->l_type == lt_loaded && !SINGLE_THREAD_P + && atomic_decrement_val (&scoperec->nusers) == 0 && __builtin_expect (scoperec->remove_after_use, 0)) { if (scoperec->notify) --- libc/elf/dl-sym.c.jj 2006-10-17 00:12:31.000000000 +0200 +++ libc/elf/dl-sym.c 2006-10-17 14:56:59.000000000 +0200 @@ -25,6 +25,7 @@ #include #include #include +#include #ifdef USE_TLS # include #endif @@ -115,7 +116,7 @@ do_sym (void *handle, const char *name, the initial binary. And then the more complex part where the object is dynamically loaded and the scope array can change. */ - if (match->l_type != lt_loaded) + if (match->l_type != lt_loaded || SINGLE_THREAD_P) result = GLRO(dl_lookup_symbol_x) (name, match, &ref, match->l_scoperec->scope, vers, 0, flags | DL_LOOKUP_ADD_DEPENDENCY, @@ -124,7 +125,7 @@ do_sym (void *handle, const char *name, { __rtld_mrlock_lock (match->l_scoperec_lock); struct r_scoperec *scoperec = match->l_scoperec; - catomic_increment (&scoperec->nusers); + atomic_increment (&scoperec->nusers); __rtld_mrlock_unlock (match->l_scoperec_lock); struct call_dl_lookup_args args; @@ -141,7 +142,7 @@ do_sym (void *handle, const char *name, int err = GLRO(dl_catch_error) (&objname, &errstring, &malloced, call_dl_lookup, &args); - if (catomic_decrement_val (&scoperec->nusers) == 0 + if (atomic_decrement_val (&scoperec->nusers) == 0 && __builtin_expect (scoperec->remove_after_use, 0)) { if (scoperec->notify) --- libc/elf/dl-close.c.jj 2006-10-17 00:12:31.000000000 +0200 +++ libc/elf/dl-close.c 2006-10-17 15:01:18.000000000 +0200 @@ -30,6 +30,7 @@ #include #include #include +#include /* Type of the constructor functions. */ @@ -419,16 +420,21 @@ _dl_close (void *_map) struct r_scoperec *old = imap->l_scoperec; - __rtld_mrlock_change (imap->l_scoperec_lock); - imap->l_scoperec = newp; - __rtld_mrlock_done (imap->l_scoperec_lock); - - if (catomic_increment_val (&old->nusers) != 1) + if (SINGLE_THREAD_P) + imap->l_scoperec = newp; + else { - old->remove_after_use = true; - old->notify = true; - if (catomic_decrement_val (&old->nusers) != 0) - __rtld_waitzero (old->nusers); + __rtld_mrlock_change (imap->l_scoperec_lock); + imap->l_scoperec = newp; + __rtld_mrlock_done (imap->l_scoperec_lock); + + if (atomic_increment_val (&old->nusers) != 1) + { + old->remove_after_use = true; + old->notify = true; + if (atomic_decrement_val (&old->nusers) != 0) + __rtld_waitzero (old->nusers); + } } /* No user anymore, we can free it now. */ --- libc/elf/dl-open.c.jj 2006-10-17 00:12:31.000000000 +0200 +++ libc/elf/dl-open.c 2006-10-17 15:30:16.000000000 +0200 @@ -31,6 +31,7 @@ #include #include #include +#include #include @@ -423,15 +424,20 @@ dl_open_worker (void *a) if (old == &imap->l_scoperec_mem) imap->l_scoperec = newp; + else if (SINGLE_THREAD_P) + { + imap->l_scoperec = newp; + free (old); + } else { __rtld_mrlock_change (imap->l_scoperec_lock); imap->l_scoperec = newp; __rtld_mrlock_done (imap->l_scoperec_lock); - catomic_increment (&old->nusers); + atomic_increment (&old->nusers); old->remove_after_use = true; - if (catomic_decrement_val (&old->nusers) == 0) + if (atomic_decrement_val (&old->nusers) == 0) /* No user, we can free it here and now. */ free (old); } --- libc/nptl/sysdeps/unix/sysv/linux/rtld-lowlevel.h.jj 2006-10-11 10:59:28.000000000 +0200 +++ libc/nptl/sysdeps/unix/sysv/linux/rtld-lowlevel.h 2006-10-17 14:52:22.000000000 +0200 @@ -62,9 +62,9 @@ typedef int __rtld_mrlock_t; { \ int newval = ((oldval & __RTLD_MRLOCK_RBITS) \ + __RTLD_MRLOCK_INC); \ - int ret = catomic_compare_and_exchange_val_acq (&(lock), \ - newval, \ - oldval); \ + int ret = atomic_compare_and_exchange_val_acq (&(lock), \ + newval, \ + oldval); \ if (__builtin_expect (ret == oldval, 1)) \ goto out; \ } \ @@ -72,7 +72,7 @@ typedef int __rtld_mrlock_t; } \ if ((oldval & __RTLD_MRLOCK_RWAIT) == 0) \ { \ - catomic_or (&(lock), __RTLD_MRLOCK_RWAIT); \ + atomic_or (&(lock), __RTLD_MRLOCK_RWAIT); \ oldval |= __RTLD_MRLOCK_RWAIT; \ } \ lll_futex_wait (lock, oldval); \ @@ -83,7 +83,7 @@ typedef int __rtld_mrlock_t; #define __rtld_mrlock_unlock(lock) \ do { \ - int oldval = catomic_exchange_and_add (&(lock), -__RTLD_MRLOCK_INC); \ + int oldval = atomic_exchange_and_add (&(lock), -__RTLD_MRLOCK_INC); \ if (__builtin_expect ((oldval \ & (__RTLD_MRLOCK_RBITS | __RTLD_MRLOCK_WWAIT)) \ == (__RTLD_MRLOCK_INC | __RTLD_MRLOCK_WWAIT), 0)) \ @@ -103,11 +103,11 @@ typedef int __rtld_mrlock_t; for (int tries = 0; tries < __RTLD_MRLOCK_TRIES; ++tries) \ { \ oldval = lock; \ - while (__builtin_expect ((oldval & __RTLD_MRLOCK_RBITS) == 0, 1))\ + while (__builtin_expect ((oldval & __RTLD_MRLOCK_RBITS) == 0, 1)) \ { \ int newval = ((oldval & __RTLD_MRLOCK_RWAIT) \ + __RTLD_MRLOCK_WRITER); \ - int ret = catomic_compare_and_exchange_val_acq (&(lock), \ + int ret = atomic_compare_and_exchange_val_acq (&(lock), \ newval, \ oldval); \ if (__builtin_expect (ret == oldval, 1)) \ @@ -115,7 +115,7 @@ typedef int __rtld_mrlock_t; } \ atomic_delay (); \ } \ - catomic_or (&(lock), __RTLD_MRLOCK_WWAIT); \ + atomic_or (&(lock), __RTLD_MRLOCK_WWAIT); \ oldval |= __RTLD_MRLOCK_WWAIT; \ lll_futex_wait (lock, oldval); \ } \ @@ -125,7 +125,7 @@ typedef int __rtld_mrlock_t; #define __rtld_mrlock_done(lock) \ do { \ - int oldval = catomic_exchange_and_add (&(lock), -__RTLD_MRLOCK_WRITER); \ + int oldval = atomic_exchange_and_add (&(lock), -__RTLD_MRLOCK_WRITER); \ if (__builtin_expect ((oldval & __RTLD_MRLOCK_RWAIT) != 0, 0)) \ lll_futex_wake (&(lock), 0x7fffffff); \ } while (0) Jakub