[PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
@ 2006-06-28 17:21 Jakub Jelinek
  2006-06-28 19:10 ` Paul Eggert
  2006-06-29 19:39 ` Michael Meeks
  0 siblings, 2 replies; 15+ messages in thread
From: Jakub Jelinek @ 2006-06-28 17:21 UTC (permalink / raw)
  To: binutils, libc-alpha; +Cc: Ulrich Drepper, Michael Meeks

[-- Attachment #1: Type: text/plain, Size: 4054 bytes --]

Hi!

The following patches introduce an optional ELF hash section
replacement, optimized for speed and data cache accesses, which in our
tests improves dynamic linking by about 50%.  Prelinking of course
eliminates the costs altogether but if prelinked apps use dlopen they
benefit for this portion.  This is incidently where some apps today
incur high costs today.

The initial design was done by Ulrich Drepper and was discussed with
Michael Meeks a few months ago as well.  But nothing came off of these
discussions and the proposed approach is quite different.

The binutils patch adds a new option to ld, --hash-style, which allows
selection of which hash sections to emit.  ld can either emit the old
style SHT_HASH section, or the new style SHT_GNU_HASH, or both (and
perhaps in the future there could be a mode in which it would emit
both, but SHT_HASH for slow compatibility only (minimal number of
buckets)).

The .gnu.hash section uses always sh_entsize 4 (so doesn't repeat the
historic mistakes on Alpha/s390x with .hash).  The first word there is
nbuckets like in .hash section, followed by nbuckets offsets into the
chains area of the new section.  If the offset is 0xffffffff, it means
there are no defined symbol names with hash % nbuckets equal to the
offset's position.  Otherwise, offset N means the corresponding chain
starts at offset (1 + nbuckets + N) * 4 into .gnu.hash section.  The
chain are does not mirror in size the symbol table anymore.

Each chain starts with a symbol index word, followed by chain length
word and then length times hash value of the corresponding symbol
name.  If DT_GNU_HASH is present, .dynsym must be sorted, so that all
symbols with the same name hash value % nbuckets are grouped together.
So, if some chain contains

symindx0 3 hash0 hash1 hash2

then dynamic symbol symindx0 has name hash hash0, symindx0+1 has hash1 and
symindx0+2 has hash2 and (hash0%nbuckets)==(hash1%nbuckets)==(hash2%nbuckets).

The hash function used is

static uint_fast32_t
dl_new_hash (const char *s)
{
  uint_fast32_t h = 5381;
  for (unsigned char c = *s; c != '\0'; c = *++s)
    h = h * 33 + c;
  return h & 0xffffffff;
}

(Dan Bernstein's string hash function posted eons ago on comp.lang.c.)

For an unsuccessful lookup (the usual case), the dynamic linker has to
read the buckets[hash % nbuckets] (one cache line) and then go through
the chain if it is not 0xffffffff, comparing each hashN with hash and
as the hashN values are 32-bit and the hashing function spreads the
strings well, it is very likely that if the hash is equal, then we
have found the symbol (and just need to verify .dynsym/.dynstr), if it
is different from all hashN values in the chain, ld.so can go on with
another shared library.  Usually the whole chain will be in one cache
line or at most 2 cache lines.

We have tested a bunch of different hash functions on the set of ~
530000 unique dynamic symbols in one Linux installation and Dan
Bernstein's hash had the fewest collisions (only 29), with very short
maximum common prefixes for the colliding symbols (just 3 chars) and
is also very cheap to compute (h * 33 is (h << 5) + h), also both ld
-O1 and non-optimizing ld hash sizing gave good results with that hash
function.  The standard SysV ELF hash function gave bad results, on
the same set of symbols had 1726 collisions and some of the colliding
symbols had very long common name prefixes.  One of the reasons could
be that SysV ELF hash function is 28 bit, while Bernstein's is 32 bit.

Attached stats file shows the biggest benchmark we used (a proglet
that attempts to do all dynamic linking OpenOffice.org 2.0.3
swriter.bin does), most of the libraries (except about 8 libs from the
total of 146 libraries used by the testcase) were relinked with
-Wl,--hash-style=both and glibc on top of the DT_GNU_HASH support
patch had also a hack, where if LD_X=1 was in environment, it would
pretend DT_GNU_HASH is not present to allow easier benchmarking.  The
LD_X support will not be in the final glibc patch.

Ok for trunk?

	Jakub

[-- Attachment #2: binutils-2.17.50.0.2-hash-style.patch --]
[-- Type: text/plain, Size: 32224 bytes --]

2006-06-27  Jakub Jelinek  <jakub@redhat.com>

include/
	* bfdlink.h (struct bfd_link_info): Add emit_hash and
	emit_gnu_hash bitfields.
include/elf/
	* common.h (SHT_GNU_HASH, DT_GNU_HASH): Define.
ld/
	* scripttempl/elf.sc: Add .gnu.hash section.
	* emultempl/elf32.em (OPTION_HASH_STYLE): Define.
	(gld${EMULATION_NAME}_add_options): Register --hash-style option.
	(gld${EMULATION_NAME}_handle_option): Handle it.
	(gld${EMULATION_NAME}_list_options): Document it.
	* ldmain.c (main): Initialize emit_hash and emit_gnu_hash.
	* ld.texinfo: Document --hash-style option.
bfd/
	* elf.c (_bfd_elf_print_private_bfd_data): Handle DT_GNU_HASH.
	(bfd_section_from_shdr, elf_fake_sections, assign_section_numbers):
	Handle SHT_GNU_HASH.
	(special_sections_g): Include .gnu.hash section.
	(bfd_elf_gnu_hash): New function.
	* elf-bfd.h (bfd_elf_gnu_hash): New prototype.
	* elflink.c (_bfd_elf_link_create_dynamic_sections): Create .hash
	only if info->emit_hash, create .gnu.hash section if
	info->emit_gnu_hash.
	(struct collect_gnu_hash_codes): New type.
	(elf_collect_gnu_hash_codes, elf_renumber_gnu_hash_syms): New
	functions.
	(compute_bucket_count): Don't compute HASHCODES array, instead add
	that and NSYMS as arguments.  Use bed->s->sizeof_hash_entry
	instead of bed->s->arch_size / 8.  Fix .hash size estimation.
	When not optimizing, use the number of hashed symbols rather than
	dynsymcount.
	(bfd_elf_size_dynamic_sections): Only add DT_HASH if info->emit_hash,
	and ADD DT_GNU_HASH if info->emit_gnu_hash.
	(bfd_elf_size_dynsym_hash_dynstr): Size .hash only if info->emit_hash,
	adjust compute_bucket_count caller.  Create and populate .gnu.hash
	section if info->emit_gnu_hash.
	(elf_link_output_extsym): Only populate .hash section if
	finfo->hash_sec != NULL.
	(bfd_elf_final_link): Adjust assertion.  Handle DT_GNU_HASH.
binutils/
	* readelf.c (get_dynamic_type): Handle DT_GNU_HASH.
	(get_section_type_name): Handle SHT_GNU_HASH.
	(dynamic_info_DT_GNU_HASH): New variable.
	(process_dynamic_section): Handle DT_GNU_HASH.
	(process_symbol_table): Print also DT_GNU_HASH histogram.

--- ld/scripttempl/elf.sc.jj	2006-01-01 01:02:16.000000000 +0100
+++ ld/scripttempl/elf.sc	2006-06-22 11:11:53.000000000 +0200
@@ -260,6 +260,7 @@ SECTIONS
   ${INITIAL_READONLY_SECTIONS}
   ${TEXT_DYNAMIC+${DYNAMIC}}
   .hash         ${RELOCATING-0} : { *(.hash) }
+  .gnu.hash     ${RELOCATING-0} : { *(.gnu.hash) }
   .dynsym       ${RELOCATING-0} : { *(.dynsym) }
   .dynstr       ${RELOCATING-0} : { *(.dynstr) }
   .gnu.version  ${RELOCATING-0} : { *(.gnu.version) }
--- ld/ldmain.c.jj	2006-06-01 15:50:33.000000000 +0200
+++ ld/ldmain.c	2006-06-22 11:21:11.000000000 +0200
@@ -304,6 +304,8 @@ main (int argc, char **argv)
   link_info.create_object_symbols_section = NULL;
   link_info.gc_sym_list = NULL;
   link_info.base_file = NULL;
+  link_info.emit_hash = TRUE;
+  link_info.emit_gnu_hash = FALSE;
   /* SVR4 linkers seem to set DT_INIT and DT_FINI based on magic _init
      and _fini symbols.  We are compatible.  */
   link_info.init_function = "_init";
--- ld/ld.texinfo.jj	2006-06-15 14:31:06.000000000 +0200
+++ ld/ld.texinfo	2006-06-22 14:03:21.000000000 +0200
@@ -1883,6 +1883,14 @@ time it takes the linker to perform its 
 increasing the linker's memory requirements.  Similarly reducing this
 value can reduce the memory requirements at the expense of speed.
 
+@kindex --hash-style=@var{style}
+@item --hash-style=@var{style}
+Set the type of linker's hash table(s).  @var{style} can be either
+@code{sysv} for classic ELF @code{.hash} section, @code{gnu} for
+new style GNU @code{.gnu.hash} section or @code{both} for both
+the classic ELF @code{.hash} and new style GNU @code{.gnu.hash}
+hash tables.  The default is @code{sysv}.
+
 @kindex --reduce-memory-overheads
 @item --reduce-memory-overheads
 This option reduces memory requirements at ld runtime, at the expense of
--- ld/emultempl/elf32.em.jj	2006-06-20 18:34:24.000000000 +0200
+++ ld/emultempl/elf32.em	2006-06-22 14:39:25.000000000 +0200
@@ -1725,7 +1725,8 @@ cat >>e${EMULATION_NAME}.c <<EOF
 #define OPTION_GROUP			(OPTION_ENABLE_NEW_DTAGS + 1)
 #define OPTION_EH_FRAME_HDR		(OPTION_GROUP + 1)
 #define OPTION_EXCLUDE_LIBS		(OPTION_EH_FRAME_HDR + 1)
-  
+#define OPTION_HASH_STYLE		(OPTION_EXCLUDE_LIBS + 1)
+
 static void
 gld${EMULATION_NAME}_add_options
   (int ns, char **shortopts, int nl, struct option **longopts,
@@ -1741,6 +1742,7 @@ cat >>e${EMULATION_NAME}.c <<EOF
     {"enable-new-dtags", no_argument, NULL, OPTION_ENABLE_NEW_DTAGS},
     {"eh-frame-hdr", no_argument, NULL, OPTION_EH_FRAME_HDR},
     {"exclude-libs", required_argument, NULL, OPTION_EXCLUDE_LIBS},
+    {"hash-style", required_argument, NULL, OPTION_HASH_STYLE},
     {"Bgroup", no_argument, NULL, OPTION_GROUP},
 EOF
 fi
@@ -1797,6 +1799,22 @@ cat >>e${EMULATION_NAME}.c <<EOF
       add_excluded_libs (optarg);
       break;
 
+    case OPTION_HASH_STYLE:
+      link_info.emit_hash = FALSE;
+      link_info.emit_gnu_hash = FALSE;
+      if (strcmp (optarg, "sysv") == 0)
+	link_info.emit_hash = TRUE;
+      else if (strcmp (optarg, "gnu") == 0)
+	link_info.emit_gnu_hash = TRUE;
+      else if (strcmp (optarg, "both") == 0)
+	{
+	  link_info.emit_hash = TRUE;
+	  link_info.emit_gnu_hash = TRUE;
+	}
+      else
+	einfo (_("%P%F: invalid hash style \`%s'\n"), optarg);
+      break;
+
     case 'z':
       if (strcmp (optarg, "initfirst") == 0)
 	link_info.flags_1 |= (bfd_vma) DF_1_INITFIRST;
@@ -1895,6 +1913,7 @@ cat >>e${EMULATION_NAME}.c <<EOF
   fprintf (file, _("  --disable-new-dtags\tDisable new dynamic tags\n"));
   fprintf (file, _("  --enable-new-dtags\tEnable new dynamic tags\n"));
   fprintf (file, _("  --eh-frame-hdr\tCreate .eh_frame_hdr section\n"));
+  fprintf (file, _("  --hash-stylle=STYLE\tSet hash style to sysv, gnu or both\n"));
   fprintf (file, _("  -z combreloc\t\tMerge dynamic relocs into one section and sort\n"));
   fprintf (file, _("  -z defs\t\tReport unresolved symbols in object files.\n"));
   fprintf (file, _("  -z execstack\t\tMark executable as requiring executable stack\n"));
--- bfd/elf-bfd.h.jj	2006-06-20 18:34:24.000000000 +0200
+++ bfd/elf-bfd.h	2006-06-26 16:17:53.000000000 +0200
@@ -1481,6 +1481,8 @@ extern bfd_vma _bfd_elf_section_offset
 
 extern unsigned long bfd_elf_hash
   (const char *);
+extern unsigned long bfd_elf_gnu_hash
+  (const char *);
 
 extern bfd_reloc_status_type bfd_elf_generic_reloc
   (bfd *, arelent *, asymbol *, void *, asection *, bfd *, char **);
--- bfd/elf.c.jj	2006-06-20 18:34:24.000000000 +0200
+++ bfd/elf.c	2006-06-26 16:17:28.000000000 +0200
@@ -206,6 +206,21 @@ bfd_elf_hash (const char *namearg)
   return h & 0xffffffff;
 }
 
+/* DT_GNU_HASH hash function.  Do not change this function; you will
+   cause invalid hash tables to be generated.  */
+
+unsigned long
+bfd_elf_gnu_hash (const char *namearg)
+{
+  const unsigned char *name = (const unsigned char *) namearg;
+  unsigned long h = 5381;
+  unsigned char ch;
+
+  while ((ch = *name++) != '\0')
+    h = (h << 5) + h + ch;
+  return h & 0xffffffff;
+}
+
 bfd_boolean
 bfd_elf_mkobject (bfd *abfd)
 {
@@ -1239,6 +1254,7 @@ _bfd_elf_print_private_bfd_data (bfd *ab
 	    case DT_AUXILIARY: name = "AUXILIARY"; stringp = TRUE; break;
 	    case DT_USED: name = "USED"; break;
 	    case DT_FILTER: name = "FILTER"; stringp = TRUE; break;
+	    case DT_GNU_HASH: name = "GNU_HASH"; break;
 	    }
 
 	  fprintf (f, "  %-11s ", name);
@@ -1823,6 +1839,7 @@ bfd_section_from_shdr (bfd *abfd, unsign
     case SHT_FINI_ARRAY:	/* .fini_array section.  */
     case SHT_PREINIT_ARRAY:	/* .preinit_array section.  */
     case SHT_GNU_LIBLIST:	/* .gnu.liblist section.  */
+    case SHT_GNU_HASH:		/* .gnu.hash section.  */
       return _bfd_elf_make_section_from_shdr (abfd, hdr, name, shindex);
 
     case SHT_DYNAMIC:	/* Dynamic linking information.  */
@@ -2295,6 +2312,7 @@ static const struct bfd_elf_special_sect
   { ".gnu.version_r", 14,  0, SHT_GNU_verneed, 0 },
   { ".gnu.liblist",   12,  0, SHT_GNU_LIBLIST, SHF_ALLOC },
   { ".gnu.conflict",  13,  0, SHT_RELA,     SHF_ALLOC },
+  { ".gnu.hash",       9,  0, SHT_GNU_HASH, SHF_ALLOC },
   { NULL,              0,  0, 0,            0 }
 };
 
@@ -2811,6 +2829,10 @@ elf_fake_sections (bfd *abfd, asection *
     case SHT_GROUP:
       this_hdr->sh_entsize = 4;
       break;
+
+    case SHT_GNU_HASH:
+      this_hdr->sh_entsize = 4;
+      break;
     }
 
   if ((asect->flags & SEC_ALLOC) != 0)
@@ -3256,6 +3278,7 @@ assign_section_numbers (bfd *abfd, struc
 	  break;
 
 	case SHT_HASH:
+	case SHT_GNU_HASH:
 	case SHT_GNU_versym:
 	  /* sh_link is the section header index of the symbol table
 	     this hash table or version table is for.  */
--- bfd/elflink.c.jj	2006-06-20 18:34:53.000000000 +0200
+++ bfd/elflink.c	2006-06-26 20:07:29.000000000 +0200
@@ -240,12 +240,24 @@ _bfd_elf_link_create_dynamic_sections (b
   if (!_bfd_elf_define_linkage_sym (abfd, info, s, "_DYNAMIC"))
     return FALSE;
 
-  s = bfd_make_section_with_flags (abfd, ".hash",
-				   flags | SEC_READONLY);
-  if (s == NULL
-      || ! bfd_set_section_alignment (abfd, s, bed->s->log_file_align))
-    return FALSE;
-  elf_section_data (s)->this_hdr.sh_entsize = bed->s->sizeof_hash_entry;
+  if (info->emit_hash)
+    {
+      s = bfd_make_section_with_flags (abfd, ".hash", flags | SEC_READONLY);
+      if (s == NULL
+	  || ! bfd_set_section_alignment (abfd, s, bed->s->log_file_align))
+	return FALSE;
+      elf_section_data (s)->this_hdr.sh_entsize = bed->s->sizeof_hash_entry;
+    }
+
+  if (info->emit_gnu_hash)
+    {
+      s = bfd_make_section_with_flags (abfd, ".gnu.hash",
+				       flags | SEC_READONLY);
+      if (s == NULL
+	  || ! bfd_set_section_alignment (abfd, s, bed->s->log_file_align))
+	return FALSE;
+      elf_section_data (s)->this_hdr.sh_entsize = 4;
+    }
 
   /* Let the backend create the rest of the sections.  This lets the
      backend set the right flags.  The backend will normally create
@@ -4811,6 +4823,122 @@ elf_collect_hash_codes (struct elf_link_
   return TRUE;
 }
 
+struct collect_gnu_hash_codes
+{
+  bfd *output_bfd;
+  unsigned long int nsyms;
+  unsigned long int *hashcodes;
+  unsigned long int *hashval;
+  unsigned long int *indx;
+  unsigned long int *loc;
+  unsigned long int *counts;
+  bfd_byte *contents;
+  long int min_dynindx;
+  unsigned long int bucketcount;
+  long int local_indx;
+};
+
+/* This function will be called though elf_link_hash_traverse to store
+   all hash value of the exported symbols in an array.  */
+
+static bfd_boolean
+elf_collect_gnu_hash_codes (struct elf_link_hash_entry *h, void *data)
+{
+  struct collect_gnu_hash_codes *s = data;
+  const char *name;
+  char *p;
+  unsigned long ha;
+  char *alc = NULL;
+
+  if (h->root.type == bfd_link_hash_warning)
+    h = (struct elf_link_hash_entry *) h->root.u.i.link;
+
+  /* Ignore indirect symbols.  These are added by the versioning code.  */
+  if (h->dynindx == -1)
+    return TRUE;
+
+  /* Ignore also local symbols and undefined symbols.  */
+  if (h->forced_local
+      || h->root.type == bfd_link_hash_undefined
+      || h->root.type == bfd_link_hash_undefweak
+      || ((h->root.type == bfd_link_hash_defined
+	   || h->root.type == bfd_link_hash_defweak)
+	  && h->root.u.def.section->output_section == NULL))
+    return TRUE;
+
+  name = h->root.root.string;
+  p = strchr (name, ELF_VER_CHR);
+  if (p != NULL)
+    {
+      alc = bfd_malloc (p - name + 1);
+      memcpy (alc, name, p - name);
+      alc[p - name] = '\0';
+      name = alc;
+    }
+
+  /* Compute the hash value.  */
+  ha = bfd_elf_gnu_hash (name);
+
+  /* Store the found hash value in the array for compute_bucket_count,
+     and also for .dynsym reordering purposes.  */
+  s->hashcodes[s->nsyms] = ha;
+  s->hashval[h->dynindx] = ha;
+  ++s->nsyms;
+  if (s->min_dynindx < 0 || s->min_dynindx > h->dynindx)
+    s->min_dynindx = h->dynindx;
+
+  if (alc != NULL)
+    free (alc);
+
+  return TRUE;
+}
+
+/* This function will be called though elf_link_hash_traverse to do
+   final dynaminc symbol renumbering.  */
+
+static bfd_boolean
+elf_renumber_gnu_hash_syms (struct elf_link_hash_entry *h, void *data)
+{
+  struct collect_gnu_hash_codes *s = data;
+  unsigned long bucket;
+
+  if (h->root.type == bfd_link_hash_warning)
+    h = (struct elf_link_hash_entry *) h->root.u.i.link;
+
+  /* Ignore indirect symbols.  */
+  if (h->dynindx == -1)
+    return TRUE;
+
+  /* Ignore also local symbols and undefined symbols.  */
+  if (h->forced_local
+      || h->root.type == bfd_link_hash_undefined
+      || h->root.type == bfd_link_hash_undefweak
+      || ((h->root.type == bfd_link_hash_defined
+	   || h->root.type == bfd_link_hash_defweak)
+	  && h->root.u.def.section->output_section == NULL))
+    {
+      if (h->dynindx >= s->min_dynindx)
+	h->dynindx = s->local_indx++;
+      return TRUE;
+    }
+
+  bucket = s->hashval[h->dynindx] % s->bucketcount;
+  if (s->counts[bucket])
+    {
+      bfd_put_32 (s->output_bfd, s->indx[bucket],
+		  s->contents + s->loc[bucket] * 4);
+      bfd_put_32 (s->output_bfd, s->counts[bucket],
+		  s->contents + s->loc[bucket] * 4 + 4);
+      s->counts[bucket] = 0;
+      s->loc[bucket] += 2;
+    }
+  bfd_put_32 (s->output_bfd, s->hashval[h->dynindx],
+	      s->contents + s->loc[bucket] * 4);
+  ++s->loc[bucket];
+  h->dynindx = s->indx[bucket]++;
+  return TRUE;
+}
+
 /* Array used to determine the number of hash table buckets to use
    based on the number of symbols there are.  If there are fewer than
    3 symbols we use 1 bucket, fewer than 17 symbols we use 3 buckets,
@@ -4832,42 +4960,26 @@ static const size_t elf_buckets[] =
    Therefore the result is always a good payoff between few collisions
    (= short chain lengths) and table size.  */
 static size_t
-compute_bucket_count (struct bfd_link_info *info)
+compute_bucket_count (struct bfd_link_info *info, unsigned long int *hashcodes,
+		      unsigned long int nsyms)
 {
   size_t dynsymcount = elf_hash_table (info)->dynsymcount;
   size_t best_size = 0;
-  unsigned long int *hashcodes;
-  unsigned long int *hashcodesp;
   unsigned long int i;
   bfd_size_type amt;
 
-  /* Compute the hash values for all exported symbols.  At the same
-     time store the values in an array so that we could use them for
-     optimizations.  */
-  amt = dynsymcount;
-  amt *= sizeof (unsigned long int);
-  hashcodes = bfd_malloc (amt);
-  if (hashcodes == NULL)
-    return 0;
-  hashcodesp = hashcodes;
-
-  /* Put all hash values in HASHCODES.  */
-  elf_link_hash_traverse (elf_hash_table (info),
-			  elf_collect_hash_codes, &hashcodesp);
-
   /* We have a problem here.  The following code to optimize the table
      size requires an integer type with more the 32 bits.  If
      BFD_HOST_U_64_BIT is set we know about such a type.  */
 #ifdef BFD_HOST_U_64_BIT
   if (info->optimize)
     {
-      unsigned long int nsyms = hashcodesp - hashcodes;
       size_t minsize;
       size_t maxsize;
       BFD_HOST_U_64_BIT best_chlen = ~((BFD_HOST_U_64_BIT) 0);
-      unsigned long int *counts ;
       bfd *dynobj = elf_hash_table (info)->dynobj;
       const struct elf_backend_data *bed = get_elf_backend_data (dynobj);
+      unsigned long int *counts;
 
       /* Possible optimization parameters: if we have NSYMS symbols we say
 	 that the hashing table must at least have NSYMS/4 and at most
@@ -4883,10 +4995,7 @@ compute_bucket_count (struct bfd_link_in
       amt *= sizeof (unsigned long int);
       counts = bfd_malloc (amt);
       if (counts == NULL)
-	{
-	  free (hashcodes);
-	  return 0;
-	}
+	return 0;
 
       /* Compute the "optimal" size for the hash table.  The criteria is a
 	 minimal chain length.  The minor criteria is (of course) the size
@@ -4913,9 +5022,9 @@ compute_bucket_count (struct bfd_link_in
 #  define BFD_TARGET_PAGESIZE	(4096)
 # endif
 
-	  /* We in any case need 2 + NSYMS entries for the size values and
-	     the chains.  */
-	  max = (2 + nsyms) * (bed->s->arch_size / 8);
+	  /* We in any case need 2 + DYNSYMCOUNT entries for the size values
+	     and the chains.  */
+	  max = (2 + dynsymcount) * bed->s->sizeof_hash_entry;
 
 # if 1
 	  /* Variant 1: optimize for short chains.  We add the squares
@@ -4925,7 +5034,7 @@ compute_bucket_count (struct bfd_link_in
 	    max += counts[j] * counts[j];
 
 	  /* This adds penalties for the overall size of the table.  */
-	  fact = i / (BFD_TARGET_PAGESIZE / (bed->s->arch_size / 8)) + 1;
+	  fact = i / (BFD_TARGET_PAGESIZE / bed->s->sizeof_hash_entry) + 1;
 	  max *= fact * fact;
 # else
 	  /* Variant 2: Optimize a lot more for small table.  Here we
@@ -4936,7 +5045,7 @@ compute_bucket_count (struct bfd_link_in
 
 	  /* The overall size of the table is considered, but not as
 	     strong as in variant 1, where it is squared.  */
-	  fact = i / (BFD_TARGET_PAGESIZE / (bed->s->arch_size / 8)) + 1;
+	  fact = i / (BFD_TARGET_PAGESIZE / bed->s->sizeof_hash_entry) + 1;
 	  max *= fact;
 # endif
 
@@ -4959,14 +5068,11 @@ compute_bucket_count (struct bfd_link_in
       for (i = 0; elf_buckets[i] != 0; i++)
 	{
 	  best_size = elf_buckets[i];
-	  if (dynsymcount < elf_buckets[i + 1])
+	  if (nsyms < elf_buckets[i + 1])
 	    break;
 	}
     }
 
-  /* Free the arrays we needed.  */
-  free (hashcodes);
-
   return best_size;
 }
 
@@ -5324,7 +5430,10 @@ bfd_elf_size_dynamic_sections (bfd *outp
 	  bfd_size_type strsize;
 
 	  strsize = _bfd_elf_strtab_size (elf_hash_table (info)->dynstr);
-	  if (!_bfd_elf_add_dynamic_entry (info, DT_HASH, 0)
+	  if ((info->emit_hash
+	       && !_bfd_elf_add_dynamic_entry (info, DT_HASH, 0))
+	      || (info->emit_gnu_hash
+		  && !_bfd_elf_add_dynamic_entry (info, DT_GNU_HASH, 0))
 	      || !_bfd_elf_add_dynamic_entry (info, DT_STRTAB, 0)
 	      || !_bfd_elf_add_dynamic_entry (info, DT_SYMTAB, 0)
 	      || !_bfd_elf_add_dynamic_entry (info, DT_STRSZ, strsize)
@@ -5726,8 +5835,6 @@ bfd_elf_size_dynsym_hash_dynstr (bfd *ou
       asection *s;
       bfd_size_type dynsymcount;
       unsigned long section_sym_count;
-      size_t bucketcount = 0;
-      size_t hash_entry_size;
       unsigned int dtagcount;
 
       dynobj = elf_hash_table (info)->dynobj;
@@ -5778,23 +5885,151 @@ bfd_elf_size_dynsym_hash_dynstr (bfd *ou
 	  memset (s->contents, 0, section_sym_count * bed->s->sizeof_sym);
 	}
 
+      elf_hash_table (info)->bucketcount = 0;
+
       /* Compute the size of the hashing table.  As a side effect this
 	 computes the hash values for all the names we export.  */
-      bucketcount = compute_bucket_count (info);
+      if (info->emit_hash)
+	{
+	  unsigned long int *hashcodes;
+	  unsigned long int *hashcodesp;
+	  bfd_size_type amt;
+	  unsigned long int nsyms;
+	  size_t bucketcount;
+	  size_t hash_entry_size;
+
+	  /* Compute the hash values for all exported symbols.  At the same
+	     time store the values in an array so that we could use them for
+	     optimizations.  */
+	  amt = dynsymcount * sizeof (unsigned long int);
+	  hashcodes = bfd_malloc (amt);
+	  if (hashcodes == NULL)
+	    return FALSE;
+	  hashcodesp = hashcodes;
 
-      s = bfd_get_section_by_name (dynobj, ".hash");
-      BFD_ASSERT (s != NULL);
-      hash_entry_size = elf_section_data (s)->this_hdr.sh_entsize;
-      s->size = ((2 + bucketcount + dynsymcount) * hash_entry_size);
-      s->contents = bfd_zalloc (output_bfd, s->size);
-      if (s->contents == NULL)
-	return FALSE;
+	  /* Put all hash values in HASHCODES.  */
+	  elf_link_hash_traverse (elf_hash_table (info),
+				  elf_collect_hash_codes, &hashcodesp);
 
-      bfd_put (8 * hash_entry_size, output_bfd, bucketcount, s->contents);
-      bfd_put (8 * hash_entry_size, output_bfd, dynsymcount,
-	       s->contents + hash_entry_size);
+	  nsyms = hashcodesp - hashcodes;
+	  bucketcount
+	    = compute_bucket_count (info, hashcodes, nsyms);
+	  free (hashcodes);
 
-      elf_hash_table (info)->bucketcount = bucketcount;
+	  if (bucketcount == 0)
+	    return FALSE;
+
+	  elf_hash_table (info)->bucketcount = bucketcount;
+
+	  s = bfd_get_section_by_name (dynobj, ".hash");
+	  BFD_ASSERT (s != NULL);
+	  hash_entry_size = elf_section_data (s)->this_hdr.sh_entsize;
+	  s->size = ((2 + bucketcount + dynsymcount) * hash_entry_size);
+	  s->contents = bfd_zalloc (output_bfd, s->size);
+	  if (s->contents == NULL)
+	    return FALSE;
+
+	  bfd_put (8 * hash_entry_size, output_bfd, bucketcount, s->contents);
+	  bfd_put (8 * hash_entry_size, output_bfd, dynsymcount,
+		   s->contents + hash_entry_size);
+	}
+
+      if (info->emit_gnu_hash)
+	{
+	  size_t i, loc, cnt;
+	  unsigned char *contents;
+	  struct collect_gnu_hash_codes cinfo;
+	  bfd_size_type amt;
+	  size_t bucketcount;
+
+	  memset (&cinfo, 0, sizeof (cinfo));
+
+	  /* Compute the hash values for all exported symbols.  At the same
+	     time store the values in an array so that we could use them for
+	     optimizations.  */
+	  amt = dynsymcount * 2 * sizeof (unsigned long int);
+	  cinfo.hashcodes = bfd_malloc (amt);
+	  if (cinfo.hashcodes == NULL)
+	    return FALSE;
+
+	  cinfo.hashval = cinfo.hashcodes + dynsymcount;
+	  cinfo.min_dynindx = -1;
+	  cinfo.output_bfd = output_bfd;
+
+	  /* Put all hash values in HASHCODES.  */
+	  elf_link_hash_traverse (elf_hash_table (info),
+				  elf_collect_gnu_hash_codes, &cinfo);
+
+	  bucketcount
+	    = compute_bucket_count (info, cinfo.hashcodes, cinfo.nsyms);
+
+	  if (bucketcount == 0)
+	    {
+	      free (cinfo.hashcodes);
+	      return FALSE;
+	    }
+
+	  amt = bucketcount * sizeof (unsigned long int) * 3;
+	  cinfo.counts = bfd_malloc (amt);
+	  if (cinfo.counts == NULL)
+	    {
+	      free (cinfo.hashcodes);
+	      return FALSE;
+	    }
+
+	  /* Determine how often each hash bucket is used.  */
+	  memset (cinfo.counts, 0, bucketcount * sizeof (cinfo.counts[0]));
+	  for (i = 0; i < cinfo.nsyms; ++i)
+	    ++cinfo.counts[cinfo.hashcodes[i] % bucketcount];
+
+	  s = bfd_get_section_by_name (dynobj, ".gnu.hash");
+	  BFD_ASSERT (s != NULL);
+	  cinfo.indx = cinfo.counts + bucketcount;
+	  cinfo.loc = cinfo.counts + 2 * bucketcount;
+	  for (i = 0, loc = 0, cnt = dynsymcount - cinfo.nsyms;
+	       i < bucketcount; ++i)
+	    if (cinfo.counts[i] != 0)
+	      {
+		cinfo.indx[i] = cnt;
+		cinfo.loc[i] = loc;
+		loc += 2 + cinfo.counts[i];
+		cnt += cinfo.counts[i];
+	      }
+	  BFD_ASSERT (cnt == dynsymcount);
+	  cinfo.bucketcount = bucketcount;
+	  cinfo.local_indx = cinfo.min_dynindx;
+
+	  s->size = (1 + bucketcount + loc) * 4;
+	  contents = bfd_zalloc (output_bfd, s->size);
+	  if (contents == NULL)
+	    {
+	      free (cinfo.counts);
+	      free (cinfo.hashcodes);
+	      return FALSE;
+	    }
+
+	  s->contents = contents;
+	  bfd_put_32 (output_bfd, bucketcount, contents);
+	  contents += 4;
+
+	  for (i = 0; i < bucketcount; ++i)
+	    {
+	      if (cinfo.counts[i] == 0)
+		bfd_put_32 (output_bfd, ~0, contents);
+	      else
+		bfd_put_32 (output_bfd, cinfo.loc[i], contents);
+	      contents += 4;
+	    }
+
+	  cinfo.contents = contents;
+
+	  /* Renumber dynamic symbols, populate .gnu.hash section.  */
+	  elf_link_hash_traverse (elf_hash_table (info),
+				  elf_renumber_gnu_hash_syms, &cinfo);
+
+	  free (cinfo.counts);
+	  free (cinfo.hashcodes);
+	}
 
       s = bfd_get_section_by_name (dynobj, ".dynstr");
       BFD_ASSERT (s != NULL);
@@ -6663,9 +6898,6 @@ elf_link_output_extsym (struct elf_link_
     {
       size_t bucketcount;
       size_t bucket;
-      size_t hash_entry_size;
-      bfd_byte *bucketpos;
-      bfd_vma chain;
       bfd_byte *esym;
 
       sym.st_name = h->dynstr_index;
@@ -6679,15 +6911,23 @@ elf_link_output_extsym (struct elf_link_
 
       bucketcount = elf_hash_table (finfo->info)->bucketcount;
       bucket = h->u.elf_hash_value % bucketcount;
-      hash_entry_size
-	= elf_section_data (finfo->hash_sec)->this_hdr.sh_entsize;
-      bucketpos = ((bfd_byte *) finfo->hash_sec->contents
-		   + (bucket + 2) * hash_entry_size);
-      chain = bfd_get (8 * hash_entry_size, finfo->output_bfd, bucketpos);
-      bfd_put (8 * hash_entry_size, finfo->output_bfd, h->dynindx, bucketpos);
-      bfd_put (8 * hash_entry_size, finfo->output_bfd, chain,
-	       ((bfd_byte *) finfo->hash_sec->contents
-		+ (bucketcount + 2 + h->dynindx) * hash_entry_size));
+
+      if (finfo->hash_sec != NULL)
+	{
+	  size_t hash_entry_size;
+	  bfd_byte *bucketpos;
+	  bfd_vma chain;
+
+	  hash_entry_size
+	    = elf_section_data (finfo->hash_sec)->this_hdr.sh_entsize;
+	  bucketpos = ((bfd_byte *) finfo->hash_sec->contents
+		       + (bucket + 2) * hash_entry_size);
+	  chain = bfd_get (8 * hash_entry_size, finfo->output_bfd, bucketpos);
+	  bfd_put (8 * hash_entry_size, finfo->output_bfd, h->dynindx, bucketpos);
+	  bfd_put (8 * hash_entry_size, finfo->output_bfd, chain,
+		   ((bfd_byte *) finfo->hash_sec->contents
+		    + (bucketcount + 2 + h->dynindx) * hash_entry_size));
+	}
 
       if (finfo->symver_sec != NULL && finfo->symver_sec->contents != NULL)
 	{
@@ -7861,7 +8101,7 @@ bfd_elf_final_link (bfd *abfd, struct bf
     {
       finfo.dynsym_sec = bfd_get_section_by_name (dynobj, ".dynsym");
       finfo.hash_sec = bfd_get_section_by_name (dynobj, ".hash");
-      BFD_ASSERT (finfo.dynsym_sec != NULL && finfo.hash_sec != NULL);
+      BFD_ASSERT (finfo.dynsym_sec != NULL);
       finfo.symver_sec = bfd_get_section_by_name (dynobj, ".gnu.version");
       /* Note that it is OK if symver_sec is NULL.  */
     }
@@ -8621,6 +8861,9 @@ bfd_elf_final_link (bfd *abfd, struct bf
 	    case DT_HASH:
 	      name = ".hash";
 	      goto get_vma;
+	    case DT_GNU_HASH:
+	      name = ".gnu.hash";
+	      goto get_vma;
 	    case DT_STRTAB:
 	      name = ".dynstr";
 	      goto get_vma;
--- include/elf/common.h.jj	2006-02-17 15:36:26.000000000 +0100
+++ include/elf/common.h	2006-06-22 10:43:21.000000000 +0200
@@ -338,6 +338,7 @@
 #define SHT_LOOS	0x60000000	/* First of OS specific semantics */
 #define SHT_HIOS	0x6fffffff	/* Last of OS specific semantics */
 
+#define SHT_GNU_HASH	0x6ffffff6	/* GNU style symbol hash table */
 #define SHT_GNU_LIBLIST	0x6ffffff7	/* List of prelink dependencies */
 
 /* The next three section types are defined by Solaris, and are named
@@ -577,6 +578,7 @@
 #define DT_VALRNGHI	0x6ffffdff
 
 #define DT_ADDRRNGLO	0x6ffffe00
+#define DT_GNU_HASH	0x6ffffef5
 #define DT_TLSDESC_PLT	0x6ffffef6
 #define DT_TLSDESC_GOT	0x6ffffef7
 #define DT_GNU_CONFLICT	0x6ffffef8
--- include/bfdlink.h.jj	2006-04-07 17:17:29.000000000 +0200
+++ include/bfdlink.h	2006-06-22 11:11:20.000000000 +0200
@@ -324,6 +324,12 @@ struct bfd_link_info
   /* TRUE if unreferenced sections should be removed.  */
   unsigned int gc_sections: 1;
 
+  /* TRUE if .hash section should be created.  */
+  unsigned int emit_hash: 1;
+
+  /* TRUE if .gnu.hash section should be created.  */
+  unsigned int emit_gnu_hash: 1;
+
   /* What to do with unresolved symbols in an object file.
      When producing executables the default is GENERATE_ERROR.
      When producing shared libraries the default is IGNORE.  The
--- binutils/readelf.c.jj	2006-05-30 16:13:54.000000000 +0200
+++ binutils/readelf.c	2006-06-27 11:58:34.000000000 +0200
@@ -135,6 +135,7 @@ static unsigned long dynamic_syminfo_off
 static unsigned int dynamic_syminfo_nent;
 static char program_interpreter[64];
 static bfd_vma dynamic_info[DT_JMPREL + 1];
+static bfd_vma dynamic_info_DT_GNU_HASH;
 static bfd_vma version_info[16];
 static Elf_Internal_Ehdr elf_header;
 static Elf_Internal_Shdr *section_headers;
@@ -1501,6 +1502,7 @@ get_dynamic_type (unsigned long type)
     case DT_GNU_CONFLICTSZ: return "GNU_CONFLICTSZ";
     case DT_GNU_LIBLIST: return "GNU_LIBLIST";
     case DT_GNU_LIBLISTSZ: return "GNU_LIBLISTSZ";
+    case DT_GNU_HASH:	return "GNU_HASH";
 
     default:
       if ((type >= DT_LOPROC) && (type <= DT_HIPROC))
@@ -2571,6 +2573,7 @@ get_section_type_name (unsigned int sh_t
     case SHT_INIT_ARRAY:	return "INIT_ARRAY";
     case SHT_FINI_ARRAY:	return "FINI_ARRAY";
     case SHT_PREINIT_ARRAY:	return "PREINIT_ARRAY";
+    case SHT_GNU_HASH:		return "GNU_HASH";
     case SHT_GROUP:		return "GROUP";
     case SHT_SYMTAB_SHNDX:	return "SYMTAB SECTION INDICIES";
     case SHT_GNU_verdef:	return "VERDEF";
@@ -6228,6 +6231,15 @@ process_dynamic_section (FILE *file)
 	    }
 	  break;
 
+	case DT_GNU_HASH:
+	  dynamic_info_DT_GNU_HASH = entry->d_un.d_val;
+	  if (do_dynamic)
+	    {
+	      print_vma (entry->d_un.d_val, PREFIX_HEX);
+	      putchar ('\n');
+	    }
+	  break;
+
 	default:
 	  if ((entry->d_tag >= DT_VERSYM) && (entry->d_tag <= DT_VERNEEDNUM))
 	    version_info[DT_VERSIONTAGIDX (entry->d_tag)] =
@@ -6903,6 +6915,9 @@ process_symbol_table (FILE *file)
   bfd_vma nchains = 0;
   bfd_vma *buckets = NULL;
   bfd_vma *chains = NULL;
+  bfd_vma ngnubuckets = 0;
+  bfd_vma *gnubuckets = NULL;
+  bfd_vma *gnuchains = NULL;
 
   if (! do_syms && !do_histogram)
     return 1;
@@ -7282,6 +7297,132 @@ process_symbol_table (FILE *file)
       free (chains);
     }
 
+  if (do_histogram && dynamic_info_DT_GNU_HASH)
+    {
+      unsigned char nb[4];
+      bfd_vma i, maxchain = 0xffffffff;
+      unsigned long *counts;
+      unsigned long hn;
+      unsigned long maxlength = 0;
+      unsigned long nzero_counts = 0;
+      unsigned long nsyms = 0;
+
+      if (fseek (file,
+		 (archive_file_offset
+		  + offset_from_vma (file, dynamic_info_DT_GNU_HASH,
+				     sizeof nb)),
+		 SEEK_SET))
+	{
+	  error (_("Unable to seek to start of dynamic information"));
+	  return 0;
+	}
+
+      if (fread (nb, 4, 1, file) != 1)
+	{
+	  error (_("Failed to read in number of buckets\n"));
+	  return 0;
+	}
+
+      ngnubuckets = byte_get (nb, 4);
+
+      gnubuckets = get_dynamic_data (file, ngnubuckets, 4);
+
+      if (gnubuckets == NULL)
+	return 0;
+
+      for (i = 0; i < ngnubuckets; i++)
+	if (gnubuckets[i] != 0xffffffff
+	    && (maxchain == 0xffffffff || gnubuckets[i] > maxchain))
+	  maxchain = gnubuckets[i];
+
+      if (maxchain == 0xffffffff)
+	return 0;
+
+      if (fseek (file,
+		 (archive_file_offset
+		  + offset_from_vma (file,
+				     dynamic_info_DT_GNU_HASH
+				     + 4 * (1 + ngnubuckets + maxchain + 1),
+				     sizeof nb)),
+		 SEEK_SET))
+	{
+	  error (_("Unable to seek to start of dynamic information"));
+	  return 0;
+	}
+
+      if (fread (nb, 4, 1, file) != 1)
+	{
+	  error (_("Failed to read last chain length\n"));
+	  return 0;
+	}
+
+      i = 2 + byte_get (nb, 4);
+      if (maxchain + i < maxchain)
+	return 0;
+      maxchain += i;
+
+      if (fseek (file,
+		 (archive_file_offset
+		  + offset_from_vma (file,
+				     dynamic_info_DT_GNU_HASH
+				     + 4 * (1 + ngnubuckets), sizeof nb)),
+		 SEEK_SET))
+	{
+	  error (_("Unable to seek to start of dynamic information"));
+	  return 0;
+	}
+
+      gnuchains = get_dynamic_data (file, maxchain, 4);
+
+      if (gnuchains == NULL)
+	return 0;
+
+      printf (_("\nHistogram for `.gnu.hash' bucket list length (total of %lu buckets):\n"),
+	      (unsigned long) ngnubuckets);
+      printf (_(" Length  Number     %% of total  Coverage\n"));
+
+      for (hn = 0; hn < ngnubuckets; ++hn)
+	if (gnubuckets[hn] != 0xffffffff)
+	  {
+	    bfd_vma length = gnuchains[gnubuckets[hn] + 1];
+	    if (length > maxlength)
+	      maxlength = length;
+	    nsyms += length;
+	  }
+
+      counts = calloc (maxlength + 1, sizeof (*counts));
+      if (counts == NULL)
+	{
+	  error (_("Out of memory"));
+	  return 0;
+	}
+
+      for (hn = 0; hn < ngnubuckets; ++hn)
+	if (gnubuckets[hn] != 0xffffffff)
+	  ++counts[gnuchains[gnubuckets[hn] + 1]];
+	else
+	  ++counts[0];
+
+      if (ngnubuckets > 0)
+	{
+	  unsigned long j;
+	  printf ("      0  %-10lu (%5.1f%%)\n",
+		  counts[0], (counts[0] * 100.0) / ngnubuckets);
+	  for (j = 1; j <= maxlength; ++j)
+	    {
+	      nzero_counts += counts[j] * j;
+	      printf ("%7lu  %-10lu (%5.1f%%)    %5.1f%%\n",
+		      j, counts[j], (counts[j] * 100.0) / ngnubuckets,
+		      (nzero_counts * 100.0) / nsyms);
+	    }
+	}
+
+      free (counts);
+
+      free (gnubuckets);
+      free (gnuchains);
+    }
+
   return 1;
 }
 

[-- Attachment #3: glibc-gnu-hash.patch --]
[-- Type: text/plain, Size: 18242 bytes --]

2006-06-28  Ulrich Drepper  <drepper@redhat.com>

	* elf/dl-lookup.c (dl_new_hash): New functions.
	(_dl_lookup_symbol_x): Rename hash to old_hash and don't compute
	value here.  Compute new-style hash value.  Pass new hash value
	and reference to variable with the old value to do_lookup_x.
	(_dl_setup_hash): If DT_GNU_HASH is defined, use it and not
	old-style hash table.
	(_dl_debug_bindings): Pass new hash value and reference to variable
	with the old value to do_lookup_x.
	* elf/do-lookup.h (do_lookup_x): Accept additional parameter with
	new-style hash value and change old-style hash value parameter to
	be a reference.  Reoganize functions to determine whether
	new-style hash table is available.  Only fall back on old-style
	table.  If old-style hash value is needed, compute it here.
	* elf/dynamic-link.h (elf_get_dynamic_info): Relocate DT_GNU_HASH
	entry.
	* elf/elf.h: Define SHT_GNU_HASH, DT_GNU_HASH, DT_TLSDEC_PLT,
	DT_TLSDEC_GOT.  Adjust DT_ADDRNUM.
	* include/link.h (struct link_map): Add l_gnu_buckets and
	l_gnu_hashbase.
	* Makeconfig: If linker supports --hash-style option add it to all
	linker command lines to build DSOs.
	* config.make.in: Define have-hash-style.
	* configure.in: Test whether linker supports --hash-style option.

--- libc/Makeconfig.~1.318.~	2006-05-15 11:21:55.000000000 -0700
+++ libc/Makeconfig	2006-06-26 13:51:15.000000000 -0700
@@ -413,6 +413,15 @@
 LDFLAGS-rtld += $(relro-LDFLAGS)
 endif
 
+ifeq (yes,$(have-hash-style))
+# For the time being we unconditionally use 'both'.  At some time we
+# should declare statically linked code as 'out of luck' and compile
+# with --hash-style=gnu only.
+hashstyle-LDFLAGS = -Wl,--hash-style=both
+LDFLAGS.so += $(hashstyle-LDFLAGS)
+LDFLAGS-rtld += $(hashstyle-LDFLAGS)
+endif
+
 # Command for linking programs with the C library.
 ifndef +link
 +link = $(CC) -nostdlib -nostartfiles -o $@ \
--- libc/config.make.in.~1.118.~	2006-04-26 08:17:41.000000000 -0700
+++ libc/config.make.in	2006-06-26 13:49:14.000000000 -0700
@@ -65,6 +65,7 @@
 have-cc-with-libunwind = @libc_cv_cc_with_libunwind@
 fno-unit-at-a-time = @fno_unit_at_a_time@
 bind-now = @bindnow@
+have-hash-style = @libc_cv_hashstyle@
 
 static-libgcc = @libc_cv_gcc_static_libgcc@
 
--- libc/configure.in.~1.460.~	2006-04-26 08:19:22.000000000 -0700
+++ libc/configure.in	2006-06-26 13:47:01.000000000 -0700
@@ -1589,6 +1589,22 @@
   rm -f conftest*])
 
   AC_SUBST(libc_cv_fpie)
+
+  AC_CACHE_CHECK(for --hash-style option,
+		 libc_cv_hashstyle, [dnl
+  cat > conftest.c <<EOF
+int _start (void) { return 42; }
+EOF
+  if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS
+			      -fPIC -shared -o conftest.so conftest.c
+			      -Wl,--hash-style=both -nostdlib 1>&AS_MESSAGE_LOG_FD])
+  then
+    libc_cv_hashstyle=yes
+  else
+    libc_cv_hashstyle=no
+  fi
+  rm -f conftest*])
+  AC_SUBST(libc_cv_hashstyle)
 fi
 
 AC_CACHE_CHECK(for -fno-toplevel-reorder, libc_cv_fno_toplevel_reorder, [dnl
--- libc/elf/dl-lookup.c.jj	2005-12-27 11:56:45.000000000 +0100
+++ libc/elf/dl-lookup.c	2006-06-27 10:12:22.000000000 +0200
@@ -1,5 +1,5 @@
 /* Look up a symbol in the loaded objects.
-   Copyright (C) 1995-2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+   Copyright (C) 1995-2005, 2006 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -72,6 +72,16 @@ struct sym_val
 #include "do-lookup.h"
 
 
+static uint_fast32_t
+dl_new_hash (const char *s)
+{
+  uint_fast32_t h = 5381;
+  for (unsigned char c = *s; c != '\0'; c = *++s)
+    h = h * 33 + c;
+  return h & 0xffffffff;
+}
+
+
 /* Add extra dependency on MAP to UNDEF_MAP.  */
 static int
 internal_function
@@ -206,7 +216,8 @@ _dl_lookup_symbol_x (const char *undef_n
 		     const struct r_found_version *version,
 		     int type_class, int flags, struct link_map *skip_map)
 {
-  const unsigned long int hash = _dl_elf_hash (undef_name);
+  const uint_fast32_t new_hash = dl_new_hash (undef_name);
+  unsigned long int old_hash = 0xffffffff;
   struct sym_val current_value = { NULL, NULL };
   struct r_scope_elem **scope = symbol_scope;
 
@@ -229,8 +240,9 @@ _dl_lookup_symbol_x (const char *undef_n
   /* Search the relevant loaded objects for a definition.  */
   for (size_t start = i; *scope != NULL; start = 0, ++scope)
     {
-      int res = do_lookup_x (undef_name, hash, *ref, &current_value, *scope,
-			     start, version, flags, skip_map, type_class);
+      int res = do_lookup_x (undef_name, new_hash, &old_hash, *ref,
+			     &current_value, *scope, start, version, flags,
+			     skip_map, type_class);
       if (res > 0)
 	break;
 
@@ -301,9 +313,9 @@ _dl_lookup_symbol_x (const char *undef_n
 	  struct sym_val protected_value = { NULL, NULL };
 
 	  for (scope = symbol_scope; *scope != NULL; i = 0, ++scope)
-	    if (do_lookup_x (undef_name, hash, *ref, &protected_value,
-			     *scope, i, version, flags, skip_map,
-			     ELF_RTYPE_CLASS_PLT) != 0)
+	    if (do_lookup_x (undef_name, new_hash, &old_hash, *ref,
+			     &protected_value, *scope, i, version, flags,
+			     skip_map, ELF_RTYPE_CLASS_PLT) != 0)
 	      break;
 
 	  if (protected_value.s != NULL && protected_value.m != undef_map)
@@ -352,6 +364,21 @@ _dl_setup_hash (struct link_map *map)
   Elf_Symndx *hash;
   Elf_Symndx nchain;
 
+  if (__builtin_expect (map->l_info[DT_ADDRTAGIDX (DT_GNU_HASH) + DT_NUM
+  				    + DT_THISPROCNUM + DT_VERSIONTAGNUM
+				    + DT_EXTRANUM + DT_VALNUM] != NULL, 1))
+    {
+      Elf32_Word *hash32
+	= (void *) D_PTR (map, l_info[DT_ADDRTAGIDX (DT_GNU_HASH) + DT_NUM
+				      + DT_THISPROCNUM + DT_VERSIONTAGNUM
+				      + DT_EXTRANUM + DT_VALNUM]);
+      map->l_nbuckets = *hash32++;
+      map->l_gnu_buckets = hash32;
+      hash32 += map->l_nbuckets;
+      map->l_gnu_hashbase = hash32;
+      return;
+    }
+
   if (!map->l_info[DT_HASH])
     return;
   hash = (void *) D_PTR (map, l_info[DT_HASH]);
@@ -399,9 +426,10 @@ _dl_debug_bindings (const char *undef_na
 	   || GLRO(dl_trace_prelink_map) == GL(dl_ns)[LM_ID_BASE]._ns_loaded)
 	  && undef_map != GL(dl_ns)[LM_ID_BASE]._ns_loaded)
 	{
-	  const unsigned long int hash = _dl_elf_hash (undef_name);
+	  const uint_fast32_t new_hash = dl_new_hash (undef_name);
+	  unsigned long int old_hash = 0xffffffff;
 
-	  do_lookup_x (undef_name, hash, *ref, &val,
+	  do_lookup_x (undef_name, new_hash, &old_hash, *ref, &val,
 		       undef_map->l_local_scope[0], 0, version, 0, NULL,
 		       type_class);
 
--- libc/elf/do-lookup.h.jj	2006-02-28 15:13:56.000000000 +0100
+++ libc/elf/do-lookup.h	2006-06-27 10:08:03.000000000 +0200
@@ -17,14 +17,15 @@
    Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
    02111-1307 USA.  */
 
+
 /* Inner part of the lookup functions.  We return a value > 0 if we
    found the symbol, the value 0 if nothing is found and < 0 if
    something bad happened.  */
 static int
 __attribute_noinline__
-do_lookup_x (const char *undef_name, unsigned long int hash,
-	     const ElfW(Sym) *ref, struct sym_val *result,
-	     struct r_scope_elem *scope, size_t i,
+do_lookup_x (const char *undef_name, uint_fast32_t new_hash,
+	     unsigned long int *old_hash, const ElfW(Sym) *ref,
+	     struct sym_val *result, struct r_scope_elem *scope, size_t i,
 	     const struct r_found_version *const version, int flags,
 	     struct link_map *skip, int type_class)
 {
@@ -67,105 +68,142 @@ do_lookup_x (const char *undef_name, uns
       strtab = (const void *) D_PTR (map, l_info[DT_STRTAB]);
       verstab = map->l_versyms;
 
-      /* Search the appropriate hash bucket in this object's symbol table
-	 for a definition for the same symbol name.  */
-      for (symidx = map->l_buckets[hash % map->l_nbuckets];
-	   symidx != STN_UNDEF;
-	   symidx = map->l_chain[symidx])
-	{
-	  sym = &symtab[symidx];
-
-	  assert (ELF_RTYPE_CLASS_PLT == 1);
-	  if ((sym->st_value == 0 /* No value.  */
+      const ElfW(Sym) *
+      __attribute_noinline__
+      check_match (const ElfW(Sym) *sym)
+      {
+	assert (ELF_RTYPE_CLASS_PLT == 1);
+	if (__builtin_expect ((sym->st_value == 0 /* No value.  */
 #ifdef USE_TLS
-	       && ELFW(ST_TYPE) (sym->st_info) != STT_TLS
+			       && ELFW(ST_TYPE) (sym->st_info) != STT_TLS
 #endif
-	       )
-	      || (type_class & (sym->st_shndx == SHN_UNDEF)))
-	    continue;
+			       )
+			      || (type_class & (sym->st_shndx == SHN_UNDEF)),
+			      0))
+	  return NULL;
 
-	  if (ELFW(ST_TYPE) (sym->st_info) > STT_FUNC
+	if (__builtin_expect (ELFW(ST_TYPE) (sym->st_info) > STT_FUNC
 #ifdef USE_TLS
-	      && ELFW(ST_TYPE) (sym->st_info) != STT_TLS
+			      && ELFW(ST_TYPE) (sym->st_info) != STT_TLS
 #endif
-	      )
-	    /* Ignore all but STT_NOTYPE, STT_OBJECT and STT_FUNC
-	       entries (and STT_TLS if TLS is supported) since these
-	       are no code/data definitions.  */
-	    continue;
-
-	  if (sym != ref && strcmp (strtab + sym->st_name, undef_name))
-	    /* Not the symbol we are looking for.  */
-	    continue;
+			      , 0))
+	  /* Ignore all but STT_NOTYPE, STT_OBJECT and STT_FUNC
+	     entries (and STT_TLS if TLS is supported) since these
+	     are no code/data definitions.  */
+	  return NULL;
+
+	if (sym != ref && strcmp (strtab + sym->st_name, undef_name))
+	  /* Not the symbol we are looking for.  */
+	  return NULL;
+
+	if (version != NULL)
+	  {
+	    if (__builtin_expect (verstab == NULL, 0))
+	      {
+		/* We need a versioned symbol but haven't found any.  If
+		   this is the object which is referenced in the verneed
+		   entry it is a bug in the library since a symbol must
+		   not simply disappear.
+
+		   It would also be a bug in the object since it means that
+		   the list of required versions is incomplete and so the
+		   tests in dl-version.c haven't found a problem.*/
+		assert (version->filename == NULL
+			|| ! _dl_name_match_p (version->filename, map));
+
+		/* Otherwise we accept the symbol.  */
+	      }
+	    else
+	      {
+		/* We can match the version information or use the
+		   default one if it is not hidden.  */
+		ElfW(Half) ndx = verstab[symidx] & 0x7fff;
+		if ((map->l_versions[ndx].hash != version->hash
+		     || strcmp (map->l_versions[ndx].name, version->name))
+		    && (version->hidden || map->l_versions[ndx].hash
+			|| (verstab[symidx] & 0x8000)))
+		  /* It's not the version we want.  */
+		  return NULL;
+	      }
+	  }
+	else
+	  {
+	    /* No specific version is selected.  There are two ways we
+	       can got here:
+
+	       - a binary which does not include versioning information
+	       is loaded
+
+	       - dlsym() instead of dlvsym() is used to get a symbol which
+	       might exist in more than one form
+
+	       If the library does not provide symbol version information
+	       there is no problem at at: we simply use the symbol if it
+	       is defined.
+
+	       These two lookups need to be handled differently if the
+	       library defines versions.  In the case of the old
+	       unversioned application the oldest (default) version
+	       should be used.  In case of a dlsym() call the latest and
+	       public interface should be returned.  */
+	    if (verstab != NULL)
+	      {
+		if ((verstab[symidx] & 0x7fff)
+		    >= ((flags & DL_LOOKUP_RETURN_NEWEST) ? 2 : 3))
+		  {
+		    /* Don't accept hidden symbols.  */
+		    if ((verstab[symidx] & 0x8000) == 0
+			&& num_versions++ == 0)
+		      /* No version so far.  */
+		      versioned_sym = sym;
+
+		    return NULL;
+		  }
+	      }
+	  }
+
+	/* There cannot be another entry for this symbol so stop here.  */
+	return sym;
+      }
 
-	  if (version != NULL)
-	    {
-	      if (__builtin_expect (verstab == NULL, 0))
-		{
-		  /* We need a versioned symbol but haven't found any.  If
-		     this is the object which is referenced in the verneed
-		     entry it is a bug in the library since a symbol must
-		     not simply disappear.
-
-		     It would also be a bug in the object since it means that
-		     the list of required versions is incomplete and so the
-		     tests in dl-version.c haven't found a problem.*/
-		  assert (version->filename == NULL
-			  || ! _dl_name_match_p (version->filename, map));
+      if (__builtin_expect (map->l_gnu_buckets != NULL, 1))
+	{
+	  /* Use the new GNU-style hash table.  */
+	  size_t chainoff = map->l_gnu_buckets[new_hash % map->l_nbuckets];
 
-		  /* Otherwise we accept the symbol.  */
-		}
-	      else
-		{
-		  /* We can match the version information or use the
-		     default one if it is not hidden.  */
-		  ElfW(Half) ndx = verstab[symidx] & 0x7fff;
-		  if ((map->l_versions[ndx].hash != version->hash
-		       || strcmp (map->l_versions[ndx].name, version->name))
-		      && (version->hidden || map->l_versions[ndx].hash
-			  || (verstab[symidx] & 0x8000)))
-		    /* It's not the version we want.  */
-		    continue;
-		}
-	    }
-	  else
+	  if (chainoff != 0xffffffff)
 	    {
-	      /* No specific version is selected.  There are two ways we
-		 can got here:
-
-		 - a binary which does not include versioning information
-		   is loaded
-
-		 - dlsym() instead of dlvsym() is used to get a symbol which
-		   might exist in more than one form
-
-		 If the library does not provide symbol version
-		 information there is no problem at at: we simply use the
-		 symbol if it is defined.
-
-		 These two lookups need to be handled differently if the
-		 library defines versions.  In the case of the old
-		 unversioned application the oldest (default) version
-		 should be used.  In case of a dlsym() call the latest and
-		 public interface should be returned.  */
-	      if (verstab != NULL)
+	      size_t n = map->l_gnu_hashbase[chainoff + 1];
+	      do
 		{
-		  if ((verstab[symidx] & 0x7fff)
-		      >= ((flags & DL_LOOKUP_RETURN_NEWEST) ? 2 : 3))
+		  --n;
+		  if (map->l_gnu_hashbase[chainoff + 2 + n] == new_hash)
 		    {
-		      /* Don't accept hidden symbols.  */
-		      if ((verstab[symidx] & 0x8000) == 0
-			  && num_versions++ == 0)
-			/* No version so far.  */
-			versioned_sym = sym;
-
-		      continue;
+		      symidx = map->l_gnu_hashbase[chainoff] + n;
+		      sym = check_match (&symtab[symidx]);
+		      if (sym != NULL)
+			goto found_it;
 		    }
 		}
+	      while (n > 0);
 	    }
+	}
+      else
+	{
+	  if (*old_hash == 0xffffffff)
+	    *old_hash = _dl_elf_hash (undef_name);
 
-	  /* There cannot be another entry for this symbol so stop here.  */
-	  goto found_it;
+	  /* Use the old SysV-style hash table.  Search the appropriate
+	     hash bucket in this object's symbol table for a definition
+	     for the same symbol name.  */
+	  for (symidx = map->l_buckets[*old_hash % map->l_nbuckets];
+	       symidx != STN_UNDEF;
+	       symidx = map->l_chain[symidx])
+	    {
+	      sym = check_match (&symtab[symidx]);
+	      if (sym != NULL)
+		goto found_it;
+	    }
 	}
 
       /* If we have seen exactly one versioned symbol while we are
--- libc/elf/dynamic-link.h	15 Mar 2005 22:57:25 -0000	1.54
+++ libc/elf/dynamic-link.h	26 Jun 2006 20:54:51 -0000
@@ -1,5 +1,5 @@
 /* Inline functions for dynamic linking.
-   Copyright (C) 1995-2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+   Copyright (C) 1995-2005, 2006 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -143,6 +143,8 @@
 # endif
       ADJUST_DYN_INFO (DT_JMPREL);
       ADJUST_DYN_INFO (VERSYMIDX (DT_VERSYM));
+      ADJUST_DYN_INFO (DT_ADDRTAGIDX (DT_GNU_HASH) + DT_NUM + DT_THISPROCNUM
+		       + DT_VERSIONTAGNUM + DT_EXTRANUM + DT_VALNUM);
 # undef ADJUST_DYN_INFO
       assert (cnt <= DL_RO_DYN_TEMP_CNT);
     }
--- libc/elf/elf.h	25 Feb 2006 01:57:44 -0000	1.154
+++ libc/elf/elf.h	26 Jun 2006 20:55:48 -0000
@@ -329,7 +329,8 @@
 #define SHT_GROUP	  17		/* Section group */
 #define SHT_SYMTAB_SHNDX  18		/* Extended section indeces */
 #define	SHT_NUM		  19		/* Number of defined types.  */
-#define SHT_LOOS	  0x60000000	/* Start OS-specific */
+#define SHT_LOOS	  0x60000000	/* Start OS-specific.  */
+#define SHT_GNU_HASH	  0x6ffffff6	/* GNU-style hash table.  */
 #define SHT_GNU_LIBLIST	  0x6ffffff7	/* Prelink library list */
 #define SHT_CHECKSUM	  0x6ffffff8	/* Checksum for DSO content.  */
 #define SHT_LOSUNW	  0x6ffffffa	/* Sun-specific low bound.  */
@@ -699,6 +700,9 @@
    If any adjustment is made to the ELF object after it has been
    built these entries will need to be adjusted.  */
 #define DT_ADDRRNGLO	0x6ffffe00
+#define DT_GNU_HASH	0x6ffffef5	/* GNU-style hash table.  */
+#define DT_TLSDESC_PLT	0x6ffffef6
+#define DT_TLSDESC_GOT	0x6ffffef7
 #define DT_GNU_CONFLICT	0x6ffffef8	/* Start of conflict section */
 #define DT_GNU_LIBLIST	0x6ffffef9	/* Library list */
 #define DT_CONFIG	0x6ffffefa	/* Configuration information.  */
@@ -709,7 +713,7 @@
 #define DT_SYMINFO	0x6ffffeff	/* Syminfo table.  */
 #define DT_ADDRRNGHI	0x6ffffeff
 #define DT_ADDRTAGIDX(tag)	(DT_ADDRRNGHI - (tag))	/* Reverse order! */
-#define DT_ADDRNUM 10
+#define DT_ADDRNUM 11
 
 /* The versioning entry types.  The next are defined as part of the
    GNU extension.  */
--- libc/include/link.h	1 Mar 2006 06:18:30 -0000	1.38
+++ libc/include/link.h	26 Jun 2006 20:56:35 -0000
@@ -124,7 +124,7 @@
     const ElfW(Phdr) *l_phdr;	/* Pointer to program header table in core.  */
     ElfW(Addr) l_entry;		/* Entry point location.  */
     ElfW(Half) l_phnum;		/* Number of program header entries.  */
-    ElfW(Half) l_ldnum;	/* Number of dynamic segment entries.  */
+    ElfW(Half) l_ldnum;		/* Number of dynamic segment entries.  */
 
     /* Array of DT_NEEDED dependencies and their dependencies, in
        dependency order for symbol lookup (with and without
@@ -141,7 +141,13 @@
 
     /* Symbol hash table.  */
     Elf_Symndx l_nbuckets;
-    const Elf_Symndx *l_buckets, *l_chain;
+    const Elf32_Word *l_gnu_buckets;
+    union
+    {
+      const Elf32_Word *l_gnu_hashbase;
+      const Elf_Symndx *l_chain;
+    };
+    const Elf_Symndx *l_buckets;
 
     unsigned int l_direct_opencount; /* Reference count for dlopen/dlclose.  */
     enum			/* Where this object came from.  */

[-- Attachment #4: prelink-gnu-hash.patch --]
[-- Type: text/plain, Size: 3580 bytes --]

--- prelink/src/prelink.h.jj	2006-06-20 15:12:16.000000000 +0200
+++ prelink/src/prelink.h	2006-06-27 21:50:11.000000000 +0200
@@ -44,6 +44,11 @@
 #define SHT_GNU_LIBLIST		0x6ffffff7
 #endif
 
+#ifndef DT_GNU_HASH
+#define DT_GNU_HASH		0x6ffffef5
+#define SHT_GNU_HASH		0x6ffffff6
+#endif
+
 struct prelink_entry;
 struct prelink_info;
 struct PLArch;
@@ -75,6 +80,7 @@ typedef struct
   GElf_Addr info_DT_GNU_PRELINKED;
   GElf_Addr info_DT_CHECKSUM;
   GElf_Addr info_DT_VERNEED, info_DT_VERDEF, info_DT_VERSYM;
+  GElf_Addr info_DT_GNU_HASH;
 #define DT_GNU_PRELINKED_BIT 50
 #define DT_CHECKSUM_BIT 51
 #define DT_VERNEED_BIT 52
@@ -83,6 +89,7 @@ typedef struct
 #define DT_FILTER_BIT 55
 #define DT_AUXILIARY_BIT 56
 #define DT_LOPROC_BIT 57
+#define DT_GNU_HASH_BIT 58
   uint64_t info_set_mask;
   int fd, fdro;
   int lastscn, dynamic;
--- prelink/src/prelink.c.jj	2005-06-10 17:09:06.000000000 +0200
+++ prelink/src/prelink.c	2006-06-27 21:53:20.000000000 +0200
@@ -1,4 +1,4 @@
-/* Copyright (C) 2001, 2002, 2003, 2004, 2005 Red Hat, Inc.
+/* Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
    Written by Jakub Jelinek <jakub@redhat.com>, 2001.
 
    This program is free software; you can redistribute it and/or modify
@@ -424,6 +424,7 @@ prelink_prepare (DSO *dso)
 	switch (dso->shdr[i].sh_type)
 	  {
 	  case SHT_HASH:
+	  case SHT_GNU_HASH:
 	  case SHT_DYNSYM:
 	  case SHT_REL:
 	  case SHT_RELA:
--- prelink/src/dso.c.jj	2006-06-21 11:46:34.000000000 +0200
+++ prelink/src/dso.c	2006-06-27 21:51:01.000000000 +0200
@@ -102,6 +102,11 @@ read_dynamic (DSO *dso)
 		  dso->info_set_mask |= (1ULL << DT_AUXILIARY_BIT);
 		else if (dyn.d_tag == DT_LOPROC)
 		  dso->info_set_mask |= (1ULL << DT_LOPROC_BIT);
+		else if (dyn.d_tag == DT_GNU_HASH)
+		  {
+		    dso->info_DT_GNU_HASH = dyn.d_un.d_val;
+		    dso->info_set_mask |= (1ULL << DT_GNU_HASH_BIT);
+		  }
 	      }
 	    if (ndx < maxndx)
 	      break;
@@ -1361,6 +1366,7 @@ adjust_dso (DSO *dso, GElf_Addr start, G
 	    return 1;
 	  break;
 	case SHT_HASH:
+	case SHT_GNU_HASH:
 	case SHT_NOBITS:
 	case SHT_STRTAB:
 	  break;
--- prelink/src/exec.c.jj	2006-05-22 16:33:43.000000000 +0200
+++ prelink/src/exec.c	2006-06-27 21:52:39.000000000 +0200
@@ -61,7 +61,11 @@ update_dynamic_tags (DSO *dso, GElf_Shdr
 	  || (dynamic_info_is_set (dso, DT_VERSYM_BIT)
 	      && dso->info_DT_VERSYM == old_shdr[j].sh_addr
 	      && old_shdr[j].sh_type == SHT_GNU_versym
-	      && set_dynamic (dso, DT_VERSYM, shdr[i].sh_addr, 1)))
+	      && set_dynamic (dso, DT_VERSYM, shdr[i].sh_addr, 1))
+	  || (dynamic_info_is_set (dso, DT_GNU_HASH_BIT)
+	      && dso->info_DT_GNU_HASH == old_shdr[j].sh_addr
+	      && old_shdr[j].sh_type == SHT_GNU_HASH
+	      && set_dynamic (dso, DT_GNU_HASH, shdr[i].sh_addr, 1)))
 	return 1;
     }
 
--- prelink/src/space.c.jj	2006-05-22 16:27:00.000000000 +0200
+++ prelink/src/space.c	2006-06-27 21:48:06.000000000 +0200
@@ -60,6 +60,7 @@ print_sections (DSO *dso, GElf_Ehdr *ehd
       { SHT_GNU_verneed, "VERNEED" },
       { SHT_GNU_versym, "VERSYM" },
       { SHT_GNU_LIBLIST, "LIBLIST" },
+      { SHT_GNU_HASH, "GNU_HASH" },
       { 0, NULL }
     };
 
@@ -181,6 +182,7 @@ readonly_is_movable (DSO *dso, GElf_Ehdr
   switch (shdr[k].sh_type)
     {
     case SHT_HASH:
+    case SHT_GNU_HASH:
     case SHT_DYNSYM:
     case SHT_REL:
     case SHT_RELA:
@@ -527,6 +529,7 @@ find_readonly_space (DSO *dso, GElf_Shdr
 	      switch (shdr[j].sh_type)
 		{
 		case SHT_HASH:
+		case SHT_GNU_HASH:
 		case SHT_DYNSYM:
 		case SHT_STRTAB:
 		case SHT_GNU_verdef:

[-- Attachment #5: stats --]
[-- Type: text/plain, Size: 14567 bytes --]

cat > a.c <<\EOF
cat /tmp/a.c
#include <dlfcn.h>
const char *libs[] = {
"libvclplug_gtk680lx.so", "libvclplug_gen680lx.so", "libnss_files.so.2", "libGL.so.1", "servicemgr.uno.so",
"shlibloader.uno.so", "simplereg.uno.so", "nestedreg.uno.so", "typemgr.uno.so", "implreg.uno.so",
"security.uno.so", "libreg.so.3", "libstore.so.3", "regtypeprov.uno.so", "configmgr2.uno.so",
"typeconverter.uno.so", "gconfbe1.uno.so", "behelper.uno.so", "sax.uno.so", "localebe1.uno.so",
"uriproc.uno.so", "libspl680lx.so", "libucb1.so", "ucpgvfs1.uno.so", "libgcc3_uno.so", "libpackage2.so",
"libfileacc.so", "libuui680lx.so", "libfilterconfig1.so", "libdtransX11680lx.so", "i18npool.uno.so",
"liblocaledata_en.so", "fsstorage.uno.so", "libxstor.so", "libdbtools680lx.so", "libcups.so.2",
"libgnutls.so.13", "libgcrypt.so.11", "libgpg-error.so.0", "libmcnttype.so", "libucpchelp1.so",
"svtmisc.uno.so" };
int
main (int argc, char **argv)
{
  int i;
  void *h;
  int flags = RTLD_LAZY;
  if (argv[1][0] == 'g')
    flags |= RTLD_GLOBAL;
  for (i = 0; i < sizeof (libs) / sizeof (libs[0]); ++i)
    h = dlopen (libs[i], flags);
  return 0;
}
EOF
gcc -g -O2 -o a a.c -Wl,-rpath,/usr/lib64/openoffice.org2.0/program/ \
  -L/usr/lib64/openoffice.org2.0/program/ -lsoffice -lsw680lx -lsvx680lx -lstdc++ -lm -shared-libgcc
for V in local global; do for M in '' 'export LD_X=1' 'export LD_BIND_NOW=1' 'export LD_X=1 LD_BIND_NOW=1'; \
  do ( for i in 1 2 3 4; do eval $M; time ./a $V; done 2>&1 > /dev/null | \
    awk 'BEGIN { printf "'"$V $M"'\t" } /^real/ { printf "%s ", $2 } END { printf "\n" }' ); done; done

local					0m0.264s 0m0.253s 0m0.256s 0m0.256s
local export LD_X=1			0m0.544s 0m0.538s 0m0.538s 0m0.537s
local export LD_BIND_NOW=1		0m0.480s 0m0.474s 0m0.477s 0m0.480s
local export LD_X=1 LD_BIND_NOW=1	0m1.102s 0m1.094s 0m1.096s 0m1.095s
global					0m0.301s 0m0.299s 0m0.294s 0m0.294s
global export LD_X=1			0m0.625s 0m0.619s 0m0.619s 0m0.618s
global export LD_BIND_NOW=1		0m0.553s 0m0.546s 0m0.544s 0m0.544s
global export LD_X=1 LD_BIND_NOW=1	0m1.251s 0m1.245s 0m1.244s 0m1.243s

for V in local global; do for M in '' 'export LD_X=1' 'export LD_BIND_NOW=1' 'export LD_X=1 LD_BIND_NOW=1'; \
  do ( echo "$V $M"; eval $M; valgrind --tool=cachegrind ./a $V 2>&1 > /dev/null | sed -n '/== I   refs/,$p' ); \
    done; done

local 
==11628== I   refs:      213,572,489
==11628== I1  misses:         11,630
==11628== L2i misses:         10,103
==11628== I1  miss rate:        0.00%
==11628== L2i miss rate:        0.00%
==11628== 
==11628== D   refs:       78,630,135  (62,272,247 rd + 16,357,888 wr)
==11628== D1  misses:      4,699,115  ( 4,544,371 rd +    154,744 wr)
==11628== L2d misses:        643,429  (   549,365 rd +     94,064 wr)
==11628== D1  miss rate:         5.9% (       7.2%   +        0.9%  )
==11628== L2d miss rate:         0.8% (       0.8%   +        0.5%  )
==11628== 
==11628== L2 refs:         4,710,745  ( 4,556,001 rd +    154,744 wr)
==11628== L2 misses:         653,532  (   559,468 rd +     94,064 wr)
==11628== L2 miss rate:          0.2% (       0.2%   +        0.5%  )
local export LD_X=1
==11632== I   refs:      306,655,479
==11632== I1  misses:         11,612
==11632== L2i misses:         10,459
==11632== I1  miss rate:        0.00%
==11632== L2i miss rate:        0.00%
==11632== 
==11632== D   refs:      129,271,101  (99,462,385 rd + 29,808,716 wr)
==11632== D1  misses:      9,739,970  ( 9,576,214 rd +    163,756 wr)
==11632== L2d misses:      3,035,531  ( 2,930,229 rd +    105,302 wr)
==11632== D1  miss rate:         7.5% (       9.6%   +        0.5%  )
==11632== L2d miss rate:         2.3% (       2.9%   +        0.3%  )
==11632== 
==11632== L2 refs:         9,751,582  ( 9,587,826 rd +    163,756 wr)
==11632== L2 misses:       3,045,990  ( 2,940,688 rd +    105,302 wr)
==11632== L2 miss rate:          0.6% (       0.7%   +        0.3%  )
local export LD_BIND_NOW=1
==11638== I   refs:      416,076,941
==11638== I1  misses:         11,145
==11638== L2i misses:          9,847
==11638== I1  miss rate:        0.00%
==11638== L2i miss rate:        0.00%
==11638== 
==11638== D   refs:      156,764,733  (123,796,220 rd + 32,968,513 wr)
==11638== D1  misses:      9,682,235  (  9,503,136 rd +    179,099 wr)
==11638== L2d misses:        967,489  (    865,728 rd +    101,761 wr)
==11638== D1  miss rate:         6.1% (        7.6%   +        0.5%  )
==11638== L2d miss rate:         0.6% (        0.6%   +        0.3%  )
==11638== 
==11638== L2 refs:         9,693,380  (  9,514,281 rd +    179,099 wr)
==11638== L2 misses:         977,336  (    875,575 rd +    101,761 wr)
==11638== L2 miss rate:          0.1% (        0.1%   +        0.3%  )
local export LD_X=1 LD_BIND_NOW=1
==11643== I   refs:      612,287,612
==11643== I1  misses:         11,141
==11643== L2i misses:         10,057
==11643== I1  miss rate:        0.00%
==11643== L2i miss rate:        0.00%
==11643== 
==11643== D   refs:      264,754,881  (202,680,154 rd + 62,074,727 wr)
==11643== D1  misses:     20,634,045  ( 20,436,902 rd +    197,143 wr)
==11643== L2d misses:      6,327,654  (  6,214,729 rd +    112,925 wr)
==11643== D1  miss rate:         7.7% (       10.0%   +        0.3%  )
==11643== L2d miss rate:         2.3% (        3.0%   +        0.1%  )
==11643== 
==11643== L2 refs:        20,645,186  ( 20,448,043 rd +    197,143 wr)
==11643== L2 misses:       6,337,711  (  6,224,786 rd +    112,925 wr)
==11643== L2 miss rate:          0.7% (        0.7%   +        0.1%  )
global 
==11647== I   refs:      229,660,039
==11647== I1  misses:         11,781
==11647== L2i misses:         10,255
==11647== I1  miss rate:        0.00%
==11647== L2i miss rate:        0.00%
==11647== 
==11647== D   refs:       86,649,339  (68,557,134 rd + 18,092,205 wr)
==11647== D1  misses:      6,704,681  ( 6,545,220 rd +    159,461 wr)
==11647== L2d misses:        685,354  (   590,853 rd +     94,501 wr)
==11647== D1  miss rate:         7.7% (       9.5%   +        0.8%  )
==11647== L2d miss rate:         0.7% (       0.8%   +        0.5%  )
==11647== 
==11647== L2 refs:         6,716,462  ( 6,557,001 rd +    159,461 wr)
==11647== L2 misses:         695,609  (   601,108 rd +     94,501 wr)
==11647== L2 miss rate:          0.2% (       0.2%   +        0.5%  )
global export LD_X=1
==11651== I   refs:      331,688,345
==11651== I1  misses:         11,730
==11651== L2i misses:         10,602
==11651== I1  miss rate:        0.00%
==11651== L2i miss rate:        0.00%
==11651== 
==11651== D   refs:      142,641,436  (109,595,921 rd + 33,045,515 wr)
==11651== D1  misses:     12,232,659  ( 12,067,731 rd +    164,928 wr)
==11651== L2d misses:      3,522,116  (  3,416,331 rd +    105,785 wr)
==11651== D1  miss rate:         8.5% (       11.0%   +        0.4%  )
==11651== L2d miss rate:         2.4% (        3.1%   +        0.3%  )
==11651== 
==11651== L2 refs:        12,244,389  ( 12,079,461 rd +    164,928 wr)
==11651== L2 misses:       3,532,718  (  3,426,933 rd +    105,785 wr)
==11651== L2 miss rate:          0.7% (        0.7%   +        0.3%  )
global export LD_BIND_NOW=1
==11656== I   refs:      445,261,358
==11656== I1  misses:         11,280
==11656== L2i misses:          9,978
==11656== I1  miss rate:        0.00%
==11656== L2i miss rate:        0.00%
==11656== 
==11656== D   refs:      171,275,049  (135,170,564 rd + 36,104,485 wr)
==11656== D1  misses:     13,300,976  ( 13,111,867 rd +    189,109 wr)
==11656== L2d misses:      1,045,200  (    943,012 rd +    102,188 wr)
==11656== D1  miss rate:         7.7% (        9.7%   +        0.5%  )
==11656== L2d miss rate:         0.6% (        0.6%   +        0.2%  )
==11656== 
==11656== L2 refs:        13,312,256  ( 13,123,147 rd +    189,109 wr)
==11656== L2 misses:       1,055,178  (    952,990 rd +    102,188 wr)
==11656== L2 miss rate:          0.1% (        0.1%   +        0.2%  )
global export LD_X=1 LD_BIND_NOW=1
==11660== I   refs:      657,215,295
==11660== I1  misses:         11,238
==11660== L2i misses:         10,165
==11660== I1  miss rate:        0.00%
==11660== L2i miss rate:        0.00%
==11660== 
==11660== D   refs:      288,810,775  (220,892,186 rd + 67,918,589 wr)
==11660== D1  misses:     25,132,151  ( 24,931,250 rd +    200,901 wr)
==11660== L2d misses:      7,240,360  (  7,126,874 rd +    113,486 wr)
==11660== D1  miss rate:         8.7% (       11.2%   +        0.2%  )
==11660== L2d miss rate:         2.5% (        3.2%   +        0.1%  )
==11660== 
==11660== L2 refs:        25,143,389  ( 24,942,488 rd +    200,901 wr)
==11660== L2 misses:       7,250,525  (  7,137,039 rd +    113,486 wr)
==11660== L2 miss rate:          0.7% (        0.8%   +        0.1%  )

for V in local global; do for M in '' '-E LD_X=1' '-E LD_BIND_NOW=1' '-E LD_X=1 -E LD_BIND_NOW=1'; \
  do ( echo "$V $M"; ./timing $M ./a $V ); done; done

local
Strip out best and worst realtime result
minimum: 0.252914000 sec real / 0.000051294 sec CPU
maximum: 0.269686000 sec real / 0.000083306 sec CPU
average: 0.254617928 sec real / 0.000071702 sec CPU
stdev  : 0.000890554 sec real / 0.000003730 sec CPU
local -E LD_X=1
optarg="LD_X=1"
Strip out best and worst realtime result
minimum: 0.536379000 sec real / 0.000050866 sec CPU
maximum: 0.539256000 sec real / 0.000079972 sec CPU
average: 0.537778428 sec real / 0.000074764 sec CPU
stdev  : 0.000612980 sec real / 0.000002034 sec CPU
local -E LD_BIND_NOW=1
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 0.470151000 sec real / 0.000053946 sec CPU
maximum: 0.481664000 sec real / 0.000084505 sec CPU
average: 0.473882142 sec real / 0.000073921 sec CPU
stdev  : 0.001887639 sec real / 0.000002616 sec CPU
local -E LD_X=1 -E LD_BIND_NOW=1
optarg="LD_X=1"
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 1.092469000 sec real / 0.000051647 sec CPU
maximum: 1.106560000 sec real / 0.000078219 sec CPU
average: 1.096268250 sec real / 0.000064646 sec CPU
stdev  : 0.002515850 sec real / 0.000003027 sec CPU
global
Strip out best and worst realtime result
minimum: 0.294585000 sec real / 0.000050279 sec CPU
maximum: 0.304168000 sec real / 0.000078209 sec CPU
average: 0.297781285 sec real / 0.000072901 sec CPU
stdev  : 0.002508159 sec real / 0.000004136 sec CPU
global -E LD_X=1
optarg="LD_X=1"
Strip out best and worst realtime result
minimum: 0.617157000 sec real / 0.000064151 sec CPU
maximum: 0.645039000 sec real / 0.000084488 sec CPU
average: 0.621962785 sec real / 0.000075530 sec CPU
stdev  : 0.002484547 sec real / 0.000003147 sec CPU
global -E LD_BIND_NOW=1
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 0.544103000 sec real / 0.000052304 sec CPU
maximum: 0.557447000 sec real / 0.000078790 sec CPU
average: 0.548014107 sec real / 0.000073886 sec CPU
stdev  : 0.002805780 sec real / 0.000002697 sec CPU
global -E LD_X=1 -E LD_BIND_NOW=1
optarg="LD_X=1"
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 1.241722000 sec real / 0.000058554 sec CPU
maximum: 1.255916000 sec real / 0.000076953 sec CPU
average: 1.247884071 sec real / 0.000063511 sec CPU
stdev  : 0.003259242 sec real / 0.000002160 sec CPU

/usr/sbin/prelink -vmR ./a
for V in local global; do for M in '' 'export LD_X=1' 'export LD_BIND_NOW=1' 'export LD_X=1 LD_BIND_NOW=1'; \
  do ( for i in 1 2 3 4; do eval $M; time ./a $V; done 2>&1 > /dev/null | \
    awk 'BEGIN { printf "'"$V $M"'\t" } /^real/ { printf "%s ", $2 } END { printf "\n" }' ); done; done

local					0m0.145s 0m0.138s 0m0.139s 0m0.139s
local export LD_X=1			0m0.274s 0m0.268s 0m0.266s 0m0.269s
local export LD_BIND_NOW=1		0m0.245s 0m0.238s 0m0.238s 0m0.239s
local export LD_X=1 LD_BIND_NOW=1	0m0.504s 0m0.497s 0m0.498s 0m0.496s
global					0m0.182s 0m0.175s 0m0.174s 0m0.175s
global export LD_X=1			0m0.352s 0m0.357s 0m0.344s 0m0.346s
global export LD_BIND_NOW=1		0m0.310s 0m0.305s 0m0.316s 0m0.306s
global export LD_X=1 LD_BIND_NOW=1	0m0.647s 0m0.641s 0m0.640s 0m0.640s

# valgrind --tool=cachegrind stats not provided for prelinked testcase,
# as valgrind apparently uses LD_PRELOAD internally and thus prevents
# prelinking.

for V in local global; do for M in '' '-E LD_X=1' '-E LD_BIND_NOW=1' '-E LD_X=1 -E LD_BIND_NOW=1'; \
  do ( echo "$V $M"; ./timing $M ./a $V ); done; done

local
Strip out best and worst realtime result
minimum: 0.137495000 sec real / 0.000066247 sec CPU
maximum: 0.142180000 sec real / 0.000086736 sec CPU
average: 0.138369035 sec real / 0.000072997 sec CPU
stdev  : 0.000575184 sec real / 0.000002132 sec CPU
local -E LD_X=1
optarg="LD_X=1"
Strip out best and worst realtime result
minimum: 0.264590000 sec real / 0.000060576 sec CPU
maximum: 0.272804000 sec real / 0.000082688 sec CPU
average: 0.266598571 sec real / 0.000072811 sec CPU
stdev  : 0.001817765 sec real / 0.000003394 sec CPU
local -E LD_BIND_NOW=1
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 0.236854000 sec real / 0.000065925 sec CPU
maximum: 0.245201000 sec real / 0.000080373 sec CPU
average: 0.238382678 sec real / 0.000075591 sec CPU
stdev  : 0.000959453 sec real / 0.000002887 sec CPU
local -E LD_X=1 -E LD_BIND_NOW=1
optarg="LD_X=1"
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 0.496607000 sec real / 0.000065955 sec CPU
maximum: 0.512757000 sec real / 0.000084887 sec CPU
average: 0.498181678 sec real / 0.000074275 sec CPU
stdev  : 0.001529594 sec real / 0.000002630 sec CPU
global
Strip out best and worst realtime result
minimum: 0.173740000 sec real / 0.000048699 sec CPU
maximum: 0.181163000 sec real / 0.000083410 sec CPU
average: 0.175901500 sec real / 0.000070443 sec CPU
stdev  : 0.001745144 sec real / 0.000003656 sec CPU
global -E LD_X=1
optarg="LD_X=1"
Strip out best and worst realtime result
minimum: 0.344016000 sec real / 0.000058830 sec CPU
maximum: 0.377289000 sec real / 0.000076792 sec CPU
average: 0.346814392 sec real / 0.000072660 sec CPU
stdev  : 0.002058835 sec real / 0.000002517 sec CPU
global -E LD_BIND_NOW=1
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 0.304208000 sec real / 0.000049604 sec CPU
maximum: 0.314217000 sec real / 0.000077094 sec CPU
average: 0.307348821 sec real / 0.000071335 sec CPU
stdev  : 0.002641413 sec real / 0.000003427 sec CPU
global -E LD_X=1 -E LD_BIND_NOW=1
optarg="LD_X=1"
optarg="LD_BIND_NOW=1"
Strip out best and worst realtime result
minimum: 0.640543000 sec real / 0.000044401 sec CPU
maximum: 0.664382000 sec real / 0.000089763 sec CPU
average: 0.646539678 sec real / 0.000071135 sec CPU
stdev  : 0.005879177 sec real / 0.000003697 sec CPU

[-- Attachment #6: glibc-gnu-hash-hack.patch --]
[-- Type: text/plain, Size: 1077 bytes --]

--- libc/elf/dl-lookup.c	2006-06-27 10:12:22.000000000 +0200
+++ libc/elf/dl-lookup.c	27 Jun 2006 14:59:07 -0000
@@ -364,7 +364,13 @@
   Elf_Symndx *hash;
   Elf_Symndx nchain;
 
-  if (__builtin_expect (map->l_info[DT_ADDRTAGIDX (DT_GNU_HASH) + DT_NUM
+#ifdef SHARED
+  extern int nognubuckets;
+#else
+#define nognubuckets 0
+#endif
+  if (!nognubuckets &&
+      __builtin_expect (map->l_info[DT_ADDRTAGIDX (DT_GNU_HASH) + DT_NUM
   				    + DT_THISPROCNUM + DT_VERSIONTAGNUM
 				    + DT_EXTRANUM + DT_VALNUM] != NULL, 1))
     {
--- libc/elf/rtld.c.~1.362.~	2006-04-08 12:50:07.000000000 -0700
+++ libc/elf/rtld.c	2006-06-27 07:54:51.000000000 -0700
@@ -2493,6 +2493,7 @@ process_dl_audit (char *str)
 extern char **_environ attribute_hidden;
 
 
+int nognubuckets;
 static void
 process_envvars (enum mode *modep)
 {
@@ -2520,6 +2521,11 @@ process_envvars (enum mode *modep)
 
       switch (len)
 	{
+	case 1:
+	  if (envline[0] == 'X')
+	    nognubuckets = 1;
+	  break;
+
 	case 4:
 	  /* Warning level, verbose or not.  */
 	  if (memcmp (envline, "WARN", 4) == 0)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 17:21 [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement Jakub Jelinek
@ 2006-06-28 19:10 ` Paul Eggert
  2006-06-28 20:25   ` Roland McGrath
  2006-06-28 21:40   ` Jakub Jelinek
  2006-06-29 19:39 ` Michael Meeks
  1 sibling, 2 replies; 15+ messages in thread
From: Paul Eggert @ 2006-06-28 19:10 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: binutils, libc-alpha, Michael Meeks

Jakub Jelinek <jakub@redhat.com> writes:

> (Dan Bernstein's string hash function posted eons ago on comp.lang.c.)

Bernstein now prefers XOR, i.e.,
"h = h * 33 ^ c;" instead of
"h = h * 33 + c;".  Did you try that as well?
Several sources indicate it's a bit better, e.g.,
<http://eternallyconfuzzled.com/tuts/hashing.html#djb2>.

> We have tested a bunch of different hash functions

Which hash functions did you try?  One-at-a-time?  FNV?  You'll
probably get hassled by hash triviists (like me :-) no matter which
function you choose, but you can forstall that to some extent by
mentioning which functions you tested.

(Thanks for doing all this, by the way.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 19:10 ` Paul Eggert
@ 2006-06-28 20:25   ` Roland McGrath
  2006-06-28 21:40   ` Jakub Jelinek
  1 sibling, 0 replies; 15+ messages in thread
From: Roland McGrath @ 2006-06-28 20:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jakub Jelinek, binutils, libc-alpha, Michael Meeks

> Bernstein now prefers XOR, i.e.,
> "h = h * 33 ^ c;" instead of
> "h = h * 33 + c;".  Did you try that as well?
> Several sources indicate it's a bit better, e.g.,
> <http://eternallyconfuzzled.com/tuts/hashing.html#djb2>.

We have not tried that function.

> > We have tested a bunch of different hash functions
> 
> Which hash functions did you try?  One-at-a-time?  FNV?  You'll
> probably get hassled by hash triviists (like me :-) no matter which
> function you choose, but you can forstall that to some extent by
> mentioning which functions you tested.

We tried the original ELF hash, the htab_hash_string function from
libiberty, and the full_name_hash function from Linux's <linux/dcache.h>.
We compared them using a real-world representative sample of symbol name
strings (collected from the dynamic symbols in a GNU/Linux installation).
We looked at the number of hash values shared by two symbols, the number
shared by three symbols (none were shared by more than three in our sample
set using any function we tried), and the length of common prefixes shared
by colliding symbols.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 19:10 ` Paul Eggert
  2006-06-28 20:25   ` Roland McGrath
@ 2006-06-28 21:40   ` Jakub Jelinek
  2006-06-28 21:46     ` Paul Eggert
  1 sibling, 1 reply; 15+ messages in thread
From: Jakub Jelinek @ 2006-06-28 21:40 UTC (permalink / raw)
  To: Paul Eggert; +Cc: binutils, libc-alpha, Michael Meeks

On Wed, Jun 28, 2006 at 11:45:44AM -0700, Paul Eggert wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> 
> > (Dan Bernstein's string hash function posted eons ago on comp.lang.c.)
> 
> Bernstein now prefers XOR, i.e.,
> "h = h * 33 ^ c;" instead of
> "h = h * 33 + c;".  Did you try that as well?
> Several sources indicate it's a bit better, e.g.,
> <http://eternallyconfuzzled.com/tuts/hashing.html#djb2>.
> 
> > We have tested a bunch of different hash functions
> 
> Which hash functions did you try?  One-at-a-time?  FNV?  You'll
> probably get hassled by hash triviists (like me :-) no matter which
> function you choose, but you can forstall that to some extent by
> mentioning which functions you tested.

Ok, added a few further functions from the URL you referenced (previously
just had sysv, libiberty, dcache, djb2 and sdbm).
The number of collisions in the 537403 symbols is:
name		2sym collision #	3sym collision #	more than 3sym collision #
sysv		1749			5
libiberty	42
dcache		37
djb2		29
sdbm		39
djb3		31
rot		337			39			61
sax		34
fnv		40
oat		30
Speed and number of collisions on the set of all symbols with
full 32-bit hash values (i.e. not modulo nbuckets) are the most important
factors in this case, as handling such collisions is expensive.
Also, djb2 (i.e. Bernstein with +) gave up max common prefix len
3, while djb3 gives 4, fnv 4 as well, oat and sax too.

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <sys/types.h>

unsigned int
elf_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int g, h = 0;
  int ch;

  while ((ch = *name++) != '\0')
    {
      h = (h << 4) + ch;
      if ((g = (h & 0xf0000000)) != 0)
        {
          h ^= g >> 24;
          h ^= g;
        }
    }
  return h;
}

unsigned int
libiberty_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 0;
  int ch;

  while ((ch = *name++) != '\0')
    h = h * 67 + ch - 113;

  return h;
}

unsigned int
dcache_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 0;
  int ch;

  while ((ch = *name++) != '\0')
    h = (h + (ch << 4) + (ch >> 4)) * 11;

  return h;
}

unsigned int
djb2_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 5381;
  int ch;

  while ((ch = *name++) != '\0')
    h = (h << 5) + h + ch;

  return h;
}

unsigned int
sdbm_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 0;
  int ch;

  while ((ch = *name++) != '\0')
    h = ch + (h << 6) + (h << 16) - h;

  return h;
}

unsigned int
djb3_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 5381;
  int ch;

  while ((ch = *name++) != '\0')
    h = ((h << 5) + h) ^ ch;

  return h;
}

unsigned int
rot_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 0;
  int ch;

  while ((ch = *name++) != '\0')
    h = ch ^ (h << 4) ^ (h >> 28);

  return h;
}

unsigned int
sax_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 0;
  int ch;

  while ((ch = *name++) != '\0')
    h ^= ch ^ (h << 5) + (h >> 2);

  return h;
}

unsigned int
fnv_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 2166136261;
  int ch;

  while ((ch = *name++) != '\0')
    h = (h * 16777619) ^ ch;

  return h;
}

unsigned int
oat_hash (const char *namearg)
{
  const unsigned char *name = (const unsigned char *) namearg;
  unsigned int h = 2166136261;
  int ch;

  while ((ch = *name++) != '\0')
    {
      h += ch;
      h += (h << 10);
      h ^= (h >> 6);
    }

  h += (h << 3);
  h ^= (h >> 11);
  h += (h << 15);

  return h;
}

int
main (int argc, char **argv)
{
  char *line = NULL;
  size_t len = 0;
  ssize_t n;
  unsigned int (*hash) (const char *);
  if (strcmp (argv[1], "sysv") == 0)
    hash = elf_hash;
  else if (strcmp (argv[1], "libiberty") == 0)
    hash = libiberty_hash;
  else if (strcmp (argv[1], "dcache") == 0)
    hash = dcache_hash;
  else if (strcmp (argv[1], "djb2") == 0)
    hash = djb2_hash;
  else if (strcmp (argv[1], "sdbm") == 0)
    hash = sdbm_hash;
  else if (strcmp (argv[1], "djb3") == 0)
    hash = djb3_hash;
  else if (strcmp (argv[1], "rot") == 0)
    hash = rot_hash;
  else if (strcmp (argv[1], "sax") == 0)
    hash = sax_hash;
  else if (strcmp (argv[1], "fnv") == 0)
    hash = fnv_hash;
  else if (strcmp (argv[1], "oat") == 0)
    hash = oat_hash;
  else
    return 1;
  while ((n = getline (&line, &len, stdin)) > 0)
    {
      if (line[n - 1] == '\n')
	line[n - 1] = '\0';
      printf ("%08x %s\n", hash (line), line);
    }
  return 0;
}

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 21:40   ` Jakub Jelinek
@ 2006-06-28 21:46     ` Paul Eggert
  2006-06-28 21:49       ` Roland McGrath
  2006-06-29  0:35       ` Ulrich Drepper
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Eggert @ 2006-06-28 21:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: binutils, libc-alpha, Michael Meeks

Jakub Jelinek <jakub@redhat.com> writes:

> Ok, added a few further functions from the URL you referenced (previously
> just had sysv, libiberty, dcache, djb2 and sdbm).

Thanks for doing that analysis.

One other (perhaps dumb) question: why limit yourself to a 32-bit hash
on machines with 64-bit addresses?  Since the application area
(linking executables) is already architecture dependent, is there a
downside to going with 64-bit hashes on such machines?

(Hope you don't mind the trivia questions from the peanut gallery....)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 21:46     ` Paul Eggert
@ 2006-06-28 21:49       ` Roland McGrath
  2006-06-29  0:35       ` Ulrich Drepper
  1 sibling, 0 replies; 15+ messages in thread
From: Roland McGrath @ 2006-06-28 21:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jakub Jelinek, binutils, libc-alpha, Michael Meeks

> One other (perhaps dumb) question: why limit yourself to a 32-bit hash
> on machines with 64-bit addresses?  Since the application area
> (linking executables) is already architecture dependent, is there a
> downside to going with 64-bit hashes on such machines?

It halves the number of table entries that can fit in the machine's cache.
To overcome the memory and cache impact of doubling the table size, there
would have to be very, very common collisions of the 32-bit hash that don't
collide with a proposed 64-bit function.  Since we can select a 32-bit hash
where collisions are acceptably rare in practice, it seems unlikely that
this tradeoff would ever balance.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 21:46     ` Paul Eggert
  2006-06-28 21:49       ` Roland McGrath
@ 2006-06-29  0:35       ` Ulrich Drepper
  1 sibling, 0 replies; 15+ messages in thread
From: Ulrich Drepper @ 2006-06-29  0:35 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jakub Jelinek, binutils, libc-alpha, Michael Meeks

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

Paul Eggert wrote:
> One other (perhaps dumb) question: why limit yourself to a 32-bit hash
> on machines with 64-bit addresses?

Waste of space.  Not only on disk, especially on memory.  The goal is to
have the entire chain in a cache line.  We have already good results
with 32 bits, wasting that space is simply not worth it.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-28 17:21 [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement Jakub Jelinek
  2006-06-28 19:10 ` Paul Eggert
@ 2006-06-29 19:39 ` Michael Meeks
  2006-06-29 21:52   ` Jakub Jelinek
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Michael Meeks @ 2006-06-29 19:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: binutils, libc-alpha, Ulrich Drepper

Hi Jakub,

On Wed, 2006-06-28 at 19:09 +0200, Jakub Jelinek wrote:
> The following patches introduce an optional ELF hash section

	Dudie; you rock ! - thanks for taking this on board; this is really
nice :-) and here was I thinking I was going to have to write a length
paper to try to convince Ulrich :-) You made my month (really).

> The initial design was done by Ulrich Drepper and was discussed with
> Michael Meeks a few months ago as well.  But nothing came off of these
> discussions and the proposed approach is quite different.

	This looks rather like -zdynsort + -zhash-vals prototype; but I'm too
pleased to quibble :-)

	Anyhow - I've been thinking about various other optimisations, many of
which we can easily & neatly fit into this nice framework, I've numbered
them to help chewing them over:

	1. Extensibility
		+ I want to see a multiply per lookup.
		+ ie. since the L2 cache misses are what hammer us -
		  if [sic] we want to extend the .chain -Bdirect
		  support in future - being able to put that in
		  the same record is really important.
		+ a 'record size' field gives us that with ~no
		  bloat as of now.
		+ [ and I really think some form of direct linking
		  is the future - but we can discuss that later 
		  clearly ]
		+ ie. having an extensibility story that is better
		  than 'just add a new section [ somewhere miles
		  away in memory/not-in-cache ] is not so fun.

	2. chain walking:
		+ the 'not having a .chain <-> .dynsym offset 
		  equivalence is an interesting innovation over my
		  approach; having said that I'm not sure why
		  you did that. Most chains are rather short - and
		  adding a set of bytes that are almost always
		  length 1/2/3 seems strange; may I suggest a
		  better approach ?
		+ Since we can sort (almost all) of .dynsym -
		  there is no problem ensuring that ~all symbols
		  that we don't want in .hash occur at the end
		  of the .dynsym table [ cf. the next UNDEF point ].
		+ Furthermore - by inspection it can be seen that
		  1/2^32 ~=== 1/2^30 [1]
			+ using this theorem, we can easily steal
			  1/2 bits from the top of the stored hash
			  and use them as chain terminators.
			+ ie. lets have:

.hash[symhash % nbuckets] -> .chain[5]

.chain[5] -> [ (not-end) | hashval[5] & 0x3fff ]
.chain[6] -> [ (not-end) | hashval[6] & 0x3fff ]
.chain[7] -> [ (is-end)  | hashval[7] & 0x3fff ]

			+ that saves us several bytes per chain,
			  and cf. above, no measurable loss in
			  performance. [ of course chain[5] 
			  <-> .dynsym[5] here as before ]

		+ in fact, for extensibility a per-shlib hash
		  comparison 'mask' field would make for a rather
		  better behaviour - so we can snarf bits from
		  here in future if we come up with brighter ideas.

	3. What we hash:
		+ it is incredible to me that UNDEF symbols are
		  included in the symbol hash; I see no legitimate
		  use for that [ eg. 'free' is an expensive hit in the
		  hash for virtually every app ;-]
		+ with 2 bits, a 'chain terminator' and a
		  'defined terminator' if we don't bin the undef's
		  from the hash completely we can at least sort
		  them to the end of the chain and often avoid
		  comparing them - [ currently they would be
		  filtered out only in check_match after a .dynsym
		  check for "if ((sym->st_value == 0 /* No value.  */...
		      || (type_class & (sym->st_shndx == SHN_UNDEF)))"
		+ This may seem an insignificant win: most
		  'normal' libs I've seen only import 1/3rd
		  of their symbols
			+ OTOH - C++ 'plugins' [ which we can't
			  load RTLD_LOCAL because of vague symbols
			  tend to have map files / visibility markup 
			  to reduce their exposed symbol count
			  substantially, eg.
			+ as an example 'writer' lives in 'libsw' using
			  objdump -T libsw680li.so  | grep UND | wc -l:
				Undefined: 6465
				Defined:   3580
			+ ie. for every 'defined' symbol that someone 
			  might legitimately want to look up we have
			  ~2 other symbols that they just -cannot- want
			  to compare.
		+ ie. in some cases by dropping UNDEF symbols, we can
		  have our new / larger .chain section and still save 
		  space [ on disk & in cache ].

	So - some comments on your patch. I notice you don't sort .dynstr as I
did ;-) that simplifies life clearly; of course - I have devious reasons
for wanting that. Wrt. -Bdirect if we sort our relocations right and use
-Bdirect, we can turn linking into a linear walk of both the library
list, .hash, .chain, .dynsym, and .dynstr [ if we play our cards
right ;-]; but that's for another day [ I'd just love
to make linking completely disappear from the L2 cache miss profile
(when using C) - I think it can be done ;-].

	Reading the patch - I really like it :-) my calculation logic to lookup
the hash stuff in a separate section was *really* ugly and evil, and
would have required some re-factoring to make it at all pleasant, your
chain walking code is sweet and pleasant to the eye etc. :-)

	Anyhow - we can build some other rather nice optimisations on top of
this I think; and if we can do some of the above we can save space as
well for a pure gnu-hash system I think. My only concern really is with
extensibility.

	Thanks so much for doing this - of course, as cache sizes increase this
may appear to be less useful[2]; but the faster we can make the linker,
the more we can split up OO.o into more libs, lazy load more,
componentize more and (hopefully) save size/time on startup & at
runtime.

	Regards,

		Michael.

[1] - so your paragraph on the hash:

	"530000 unique dynamic symbols ... Dan Bernstein's hash ... fewest
collisions (only 29) ... The standard SysV ELF hash function gave bad
results ... 1726 collisions"

	Is interesting - and I'm sure the hash you propose is great - but of
course, with the hash comparison the reduction in the number of L2 cache
misses is so great that the 'quality' of the hash at this level is
rather uninteresting wrt. the optimisation [ IHMO etc. ].

	ie. 29/530000 ~= 1726/530000, well nearly anyway ;-) focusing on a
0.005% hit vs. 0.3% is interesting but not worth fighting over. More
interestingly can we pick a few bits to steal and only degrade that by a
factor of 4 or so ? ie. 0.005% vs. 0.02% ;-)

[2] - the Core Duo here with it's 2Mb cache is amazingly faster for OO.o
linking, but clearly at smaller sizes there is this appalling cliff to
drop-off wrt. linking performance.

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-29 19:39 ` Michael Meeks
@ 2006-06-29 21:52   ` Jakub Jelinek
  2006-07-03 15:12     ` DT_GNU_HASH: reducing working set Michael Meeks
  2006-06-30 23:55   ` DT_GNU_HASH: ~ 50% dynamic linking improvement Ulrich Drepper
  2006-07-03  9:26   ` Jakub Jelinek
  2 siblings, 1 reply; 15+ messages in thread
From: Jakub Jelinek @ 2006-06-29 21:52 UTC (permalink / raw)
  To: Michael Meeks; +Cc: binutils, libc-alpha, Ulrich Drepper

On Thu, Jun 29, 2006 at 07:27:06PM +0100, Michael Meeks wrote:
> 	3. What we hash:
> 		+ it is incredible to me that UNDEF symbols are
> 		  included in the symbol hash; I see no legitimate
> 		  use for that [ eg. 'free' is an expensive hit in the
> 		  hash for virtually every app ;-]

More comments later, but as you can see in the patch, undefined
symbols are never added to .gnu.hash chains, only defined ones
(and, as I was lazy, also some of the undefined PLT symbols [*]).

That's what the:
+  /* Ignore also local symbols and undefined symbols.  */
+  if (h->forced_local
+      || h->root.type == bfd_link_hash_undefined
+      || h->root.type == bfd_link_hash_undefweak
+      || ((h->root.type == bfd_link_hash_defined
+          || h->root.type == bfd_link_hash_defweak)
+         && h->root.u.def.section->output_section == NULL))
+    return TRUE;
in elf_collect_gnu_hash_codes and elf_renumber_gnu_hash_syms
does, .dynsym after sorting contains first the special symbols
(0, STT_SECTION, STB_LOCAL), then SHN_UNDEF symbols and only
after this contains the defined ones, sorted by the hash % nbuckets
value.

[*] SHN_UNDEF symbols with st_value != 0 need to be in .gnu.hash
(and are there), as they are used when resolving relocations that
take address of functions.  These are at that point still
bfd_link_hash_defined with non-NULL output_section and only
during dynamic symbol finalization are turned into st_shndx SHN_UNDEF.
But, on i386/x86_64 we have an optimization where if the binary
never takes address of some function, st_value on its SHN_UNDEF
symbol is cleared (see e.g. elf64_x86_64_finish_dynamic_symbol
          /* Mark the symbol as undefined, rather than as defined in
             the .plt section.  Leave the value if there were any
             relocations where pointer equality matters (this is a clue
             for the dynamic linker, to make function pointer
             comparisons work between an application and shared
             library), otherwise set it to zero.  If a function is only
             called from a binary, there is no need to slow down
             shared libraries because of that.  */
          sym->st_shndx = SHN_UNDEF;
          if (!h->pointer_equality_needed)
            sym->st_value = 0;
).  Such symbols could be left out of .gnu.hash too, assuming we
add some target hook, as pointer_equality_needed bit is target specific.

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-29 19:39 ` Michael Meeks
  2006-06-29 21:52   ` Jakub Jelinek
@ 2006-06-30 23:55   ` Ulrich Drepper
  2006-07-03  9:26   ` Jakub Jelinek
  2 siblings, 0 replies; 15+ messages in thread
From: Ulrich Drepper @ 2006-06-30 23:55 UTC (permalink / raw)
  To: michael.meeks; +Cc: Jakub Jelinek, binutils, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

Michael Meeks wrote:
> and here was I thinking I was going to have to write a length
> paper to try to convince Ulrich :-) You made my month (really).

What are you talking about?  I never said that I'm opposed to changing
the hash table.  I'm not going to agree to direct bindings, that's all,
and you constantly insisted on bringing that nonsense up.

> 	Anyhow - I've been thinking about various other optimisations, many of
> which we can easily & neatly fit into this nice framework, I've numbered
> them to help chewing them over:
> 
> 	1. Extensibility
> 		+ I want to see a multiply per lookup.
> 		+ ie. since the L2 cache misses are what hammer us -
> 		  if [sic] we want to extend the .chain -Bdirect
> 		  support in future - being able to put that in
> 		  the same record is really important.
> 		+ a 'record size' field gives us that with ~no
> 		  bloat as of now.

You absolutely cannot add direct bindings as an optional feature.
Period.  A binary using direct bindings can and often will have
different effective bindings then a binary which uses normal lookup.
It's complete impossible.  A program can either be compiled and even
_designed_ for direct binding or not.

Adding support in any form for such an incompatible feature to the hash
table which is purely an optional extension is unacceptable.

> 		+ Since we can sort (almost all) of .dynsym -
> 		  there is no problem ensuring that ~all symbols
> 		  that we don't want in .hash occur at the end
> 		  of the .dynsym table [ cf. the next UNDEF point ].

This can only possibly work if you have spare bits...

> 		+ Furthermore - by inspection it can be seen that
> 		  1/2^32 ~=== 1/2^30 [1]
> 			+ using this theorem, we can easily steal
> 			  1/2 bits from the top of the stored hash
> 			  and use them as chain terminators.

... and this is nonsense.  You cannot compute the probabilities by
looking at the complete set of symbols.  You have to determine which
kind of symbols have the same hash value after then change and then make
sure they probability of having those is the same as in the complete
set.  This (most) likely is not the case.

Instead what will happen is that the symbols which are used together in
a program have a higher likelihood to collide (common prefixes etc).  If
even the reduction from 32 to 31 bits produces on the full set a factor
o >2 more duplicates (Jakub tested it) this number is likely higher in
the real world.

On the other hand, what is gained?  You already said we have rather
short hash chains.  Most likely not longer than 5 or 6.  I can count the
number of DSOs with longer chains on my hands and there is not more than
one longer chain in each DSO.  Now do the math: even assuming even
distribution of the chain we have on average 6 chain elements per cache
line.  If the linker sorts them cache lines we can fit in 14.  That's
enough even for the worst chain length.

So, on the one hand you save one 4-byte word which in almost no case
will cause an additional cache line to be loaded.  And even if it would,
the cache lines are consecutive the the processors prefetching takes
case of it.  On the other hand you more than double the likelihood to
have to compare strings.  This is IMO reason enough to not change the
format.

> 	3. What we hash:
> 		+ it is incredible to me that UNDEF symbols are

You didn't read what Jakub wrote.  We don't hash UNDEF records in the
new format.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: ~ 50% dynamic linking improvement
  2006-06-29 19:39 ` Michael Meeks
  2006-06-29 21:52   ` Jakub Jelinek
  2006-06-30 23:55   ` DT_GNU_HASH: ~ 50% dynamic linking improvement Ulrich Drepper
@ 2006-07-03  9:26   ` Jakub Jelinek
  2 siblings, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2006-07-03  9:26 UTC (permalink / raw)
  To: Michael Meeks; +Cc: binutils, libc-alpha, Ulrich Drepper

On Thu, Jun 29, 2006 at 07:27:06PM +0100, Michael Meeks wrote:
> 	1. Extensibility
> 		+ I want to see a multiply per lookup.
> 		+ ie. since the L2 cache misses are what hammer us -
> 		  if [sic] we want to extend the .chain -Bdirect
> 		  support in future - being able to put that in
> 		  the same record is really important.
> 		+ a 'record size' field gives us that with ~no
> 		  bloat as of now.
> 		+ [ and I really think some form of direct linking
> 		  is the future - but we can discuss that later 
> 		  clearly ]
> 		+ ie. having an extensibility story that is better
> 		  than 'just add a new section [ somewhere miles
> 		  away in memory/not-in-cache ] is not so fun.

On the other side, it is IMHO much better to keep using one section
for one purpose, as ELF has always been doing.  So, use a hash section
for mapping a symbol name to a set of indexes of dynamic symbol definition
candidates, not less, not more.  That's what DT_HASH has been doing and
DT_GNU_HASH is doing too.  Direct linking is mainly useful for OSes where
there is one entity controlling the whole thing, in the Open Source world
it can IMHO work only in very limited cases for program which have the
vast majority of dependent libraries coming from the same source, with the
same schedule, with bugfixes leading to shipping a new version of the whole
thing.  If the libraries on the other side are updated independently,
without one body controlling them all, symbols keep being added and removed
(for C the latter would be an ABI break, but for C++ the latter often
happens just when you change the internal implementation of some
function/method and it no longer references some function/method which suddenly
does not need to be compiled in) and direct linking just will lead to
horrible ODR and pointer equality violations that programs will sooner or
later break on.  So for direct linking which certainly won't be used for
all programs, but perhaps just a very limited set, you really should use
a separate section.

> 		+ Furthermore - by inspection it can be seen that
> 		  1/2^32 ~=== 1/2^30 [1]
> 			+ using this theorem, we can easily steal
> 			  1/2 bits from the top of the stored hash
> 			  and use them as chain terminators.
> 			+ ie. lets have:
> 
> .hash[symhash % nbuckets] -> .chain[5]
> 
> .chain[5] -> [ (not-end) | hashval[5] & 0x3fff ]
> .chain[6] -> [ (not-end) | hashval[6] & 0x3fff ]
> .chain[7] -> [ (is-end)  | hashval[7] & 0x3fff ]
> 
> 			+ that saves us several bytes per chain,
> 			  and cf. above, no measurable loss in
> 			  performance. [ of course chain[5] 
> 			  <-> .dynsym[5] here as before ]
> 
> 		+ in fact, for extensibility a per-shlib hash
> 		  comparison 'mask' field would make for a rather
> 		  better behaviour - so we can snarf bits from
> 		  here in future if we come up with brighter ideas.

Applications keep growing every year and while I currently counted
~ 500000 different symbol names, it can soon double, triple and keep
increasing.  So, while we have ATM just a few collisions, there can be
more and more in the future.  And handling those is expensive.
And, all linker can do for this is try to keep all the chains short enough
to fit into cachelines.

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: reducing working set ...
  2006-06-29 21:52   ` Jakub Jelinek
@ 2006-07-03 15:12     ` Michael Meeks
  2006-07-03 15:59       ` Jakub Jelinek
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Meeks @ 2006-07-03 15:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: binutils, libc-alpha, Ulrich Drepper

Hi there,

On Thu, 2006-06-29 at 21:39 +0200, Jakub Jelinek wrote:
> More comments later, but as you can see in the patch, undefined
> symbols are never added to .gnu.hash chains, only defined ones

	Ah - cool :-) thanks for that.

> Jakub wrote in reply to:
> On Thu, Jun 29, 2006 at 07:27:06PM +0100, Michael Meeks wrote:
> >       1. Extensibility
>
> On the other side, it is IMHO much better to keep using one
> section for one purpose, as ELF has always been doing.

	Yes - you're quite right - it's crack smoking to want to put -Bdirect
stuff in that section anyway, as you say: a few minutes thought in a
less excited state convinced me of that [ sadly while off line ].

	OTOH - I do think that allowing the stealing of some bits from the hash
code [ for other purposes ] in future is prudent, and need not perform
badly:

	if (!(new_hash ^ l->chain[foo]) & l->new_hash_mask))
		... hit.

	etc. shouldn't be overly slower than a straight comparison surely ?

> Applications keep growing every year and while I currently counted
> ~ 500000 different symbol names, it can soon double, triple and keep
> increasing. 

	From what I can see - your method was to grab ~all the symbols on your
system, and compare hashes for all of them ? and we only got <100
collisions ? I take Ulrich's point that perhaps many of these are in the
same module [be interesting to see] due to highly similar names etc. but
even so this is a tiny collision rate.

>	And, all linker can do for this is try to keep all the chains
> short enough to fit into cachelines.

	So - perhaps you can help me here; I've been working with a mental
model of several levels of cache - whereby the more you can compress the
working set, the faster access will be. Now I appreciate the cache line
argument - that after the 1st access, reads inside the same cache line
are really fast [ and (I infer from Ulrich's post) adjacent cache line
reads are free ].

On Fri, 2006-06-30 at 12:36 -0700, Ulrich Drepper wrote:
> On the other hand, what is gained?  You already said we have rather
> short hash chains.  Most likely not longer than 5 or 6.  I can count the
> number of DSOs with longer chains on my hands and there is not more than
> one longer chain in each DSO.

	Jakub recently posted this sort of thing:

[snip]
libstdc++.so.6.0.8

Histogram for `.gnu.hash' bucket list length (total of 1022 buckets):
 Length  Number     % of total  Coverage
      0  52         (  5.1%)
      1  131        ( 12.8%)      4.1%
      2  225        ( 22.0%)     18.4%
      3  233        ( 22.8%)     40.5%
      4  171        ( 16.7%)     62.2%
      5  115        ( 11.3%)     80.4%
      6  64         (  6.3%)     92.5%
      7  18         (  1.8%)     96.5%
      8  8          (  0.8%)     98.5%
      9  4          (  0.4%)     99.7%
     10  1          (  0.1%)    100.0%
...
For libc-2.4.90.so:
...
Histogram for `.gnu.hash' bucket list length (total of 1011 buckets):
 Length  Number     % of total  Coverage
      0  124        ( 12.3%)
      1  254        ( 25.1%)     12.4%
      2  296        ( 29.3%)     41.2%
      3  198        ( 19.6%)     70.2%
      4  98         (  9.7%)     89.3%
      5  29         (  2.9%)     96.4%
      6  10         (  1.0%)     99.3%
      7  2          (  0.2%)    100.0%
[snip]

	But sure - long chains should be unusual. It's perhaps interesting to
consider where the performance balance between expanding the base .hash
to get more 0xffffffff hits, vs. shrinking it to fit more of it into L1
cache is.

>   Now do the math: even assuming even distribution of the chain we have
> on average 6 chain elements per cache line.  If the linker sorts them
> cache lines we can fit in 14.  That's enough even for the worst chain
> length.

	By "sorts them cache lines" I assume you mean adding padding [perhaps undef]
records / alignment to .dynsym to maximise the length of chain inside
cache lines ? that could certainly be done later; interesting.

> So, on the one hand you save one 4-byte word which in almost no case
> will cause an additional cache line to be loaded.  And even if it would,
> the cache lines are consecutive the the processors prefetching takes
> case of it.  On the other hand you more than double the likelihood to
> have to compare strings.  This is IMO reason enough to not change the
> format.

	Nah - let me do the maths another way and perhaps I'll get the result
I'm looking for ;-) by crunching the above numbers that Jakub provided -
I see an avg. no. of comparisons per chain of:

libstdc++	3.09		# sum({count*length})/sum(count)
glibc		2.03

	So a finger-in-the-air avg. chain length is perhaps ~2.5. [1]

	My proposed alternate design sacrifices 1 bit of the hash, lets (for
the sake of argument) assume this has the most awful performance
degredation of a factor of 10 (instead of perhaps ~2-4).

	The proposed alternate design ('stolen bit') is engineered thus:

	* By sorting ~all not-hashed .dynsym entried to the 'end' we end
	  up with contiguous runs of .dynsym entries [ as now ] in
	  chains, adjacent to each other. Thus - by stealing a
	  terminator bit we avoid needing a chain length. [ the 'offset'
	  in the .hash can then just be a .dynsym index ]

	So - lets also assume we have 10^6 unique symbols across a few
libraries - just for argument.

	It seems then that the likelihood (with a purely random access pattern)
of a cache hit [ ie. not taking a L1/L2 cache miss ;-] is ~=
sizeof(cache)/sizeof(data_set).

	With a 64k L1 cache: and a 4Mb working set (stolen-bit) this gives a
1.6% hit rate. [ for a large L2 of course it's higher, but lets look a
the worst case, smallest cache, where this effect is lowest ]

	If we increase the size of each field by 4 bytes per 2.5 records [ as
per the existing 'chain-len' design ] from our avg 400k chains we get: a
1.6Mb increase (factor of 1.4) to a 5.6Mb working set. This gives us a
lower cache hit rate of 1.1%.

	But - of course; as you point out there is a penalty for hash
collisions - by reducing precision we loose on comparison:

	32bits	- 30/530000 = 0.005% false positives
	31bits	- 10x worse = 0.05%

	So - taking our 1 million record hypothetical set:

		L1 hits		false positives
chain-len.	11k		56
stolen-bit	16k		560

	So - if we assume a (conservative) cost of a false positive of ~10 L2
cache misses [ it's more like 3 ] then this seems reasonable.

	So - of course each false positive is prolly less than the 10 L2 cache
misses necessary to make these compare (right?). As the cache grows, of
course the assumptions start to fail, but with a 2Mb L2 cache

            L1 hits: 64k   L2: 2Mb   false positives
chain-len.  11k            500k      56
stolen-bit  16k            360k      560

	So now we need a false positive to generate 280 L2 misses to get to a
performance parity for the (existing) chain-len solution.

	Of course, it's entirely probable that my numbers, assumptions,
calculations and raison-d'etre are entirely incorrect :-) Of course,
wrt. data set size much more than just searching in the .dynsym section
is going on at link time diluting the argument [etc.] Is any of it even
slightly convincing ?

	Either way - since I now have time; if you're interested I'm happy to
try to hack this up so it can be profiled both ways; presumably
cachegrind can give a nice repeatable answer. If cachegrind shows it
being more cache efficient - would you consider a re-work / simplify ?

	Thanks,

		Michael.

[1] - of course assuming that we almost never hit - which I believe is
the common case.
-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: reducing working set ...
  2006-07-03 15:12     ` DT_GNU_HASH: reducing working set Michael Meeks
@ 2006-07-03 15:59       ` Jakub Jelinek
  2006-07-03 18:18         ` Michael Meeks
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Jelinek @ 2006-07-03 15:59 UTC (permalink / raw)
  To: Michael Meeks; +Cc: binutils, libc-alpha, Ulrich Drepper

On Mon, Jul 03, 2006 at 04:15:20PM +0100, Michael Meeks wrote:
> 	Yes - you're quite right - it's crack smoking to want to put -Bdirect
> stuff in that section anyway, as you say: a few minutes thought in a
> less excited state convinced me of that [ sadly while off line ].

Ok, good we are on the same page ;)

> 	OTOH - I do think that allowing the stealing of some bits from the hash
> code [ for other purposes ] in future is prudent, and need not perform
> badly:
> 
> 	if (!(new_hash ^ l->chain[foo]) & l->new_hash_mask))
> 		... hit.
> 
> 	etc. shouldn't be overly slower than a straight comparison surely ?

Djamel privately mentioned to me today that indeed if we require
nbuckets > 1 for valid .gnu.hash sections, we can get the chain
termination bit for free, i.e. not causing any extra collisions.
That's because if nbuckets > 1, then (hash+1) % nbuckets != hash % nbuckets,
so the bottom-most bit is effectively encoded in the bucket.

So we can get rid of the chain length words, and in that case we can
get rid of the symbol index word as well.

We have 2 options, either we create a 1:1 correspondence between the chain
position and .dynsym index (and perhaps sort the UNDEF symbols high, so that
we can trim the end of the .gnu.hash section), or we can store some addend
into the second word of .gnu.hash (before the buckets array).
ELF requires that the STB_LOCAL symbols are at the beginning, not end of
the .dynsym section.  The UNDEF symbols can of course be anywhere (after
the STB_LOCAL symbols).  On most of the arches there are only a few
STB_LOCAL symbols (< ~20), but on e.g. ia64 there are really many
(e.g. ia64 glibc has ~ 420 such symbols, all caused by relocations).
So, perhaps using the addend would be best.

It will limit us a little bit in the reordering to minimize chains crossing
cacheline boundaries in that we no longer can have padding/gaps in there
(well, we could store some UNDEF symbols there), but we still can do some
reordering.

I will rework the patch now.

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DT_GNU_HASH: reducing working set ...
  2006-07-03 15:59       ` Jakub Jelinek
@ 2006-07-03 18:18         ` Michael Meeks
  2006-07-03 21:05           ` [PATCH] DT_GNU_HASH: reducing working set ... (take 2) Jakub Jelinek
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Meeks @ 2006-07-03 18:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: binutils, libc-alpha, Ulrich Drepper


On Mon, 2006-07-03 at 17:59 +0200, Jakub Jelinek wrote:
> Ok, good we are on the same page ;)

	:-)

> Djamel privately mentioned to me today that indeed if we require
> nbuckets > 1 for valid .gnu.hash sections, we can get the chain
> termination bit for free, i.e. not causing any extra collisions.
> That's because if nbuckets > 1, then (hash+1) % nbuckets != hash % nbuckets,
> so the bottom-most bit is effectively encoded in the bucket.

	Ah - that's a neat trick: nice [ not that it isn't worth the single bit
anyway of course, as per convoluted calculations ;-]

> We have 2 options, either we create a 1:1 correspondence between the chain
> position and .dynsym index (and perhaps sort the UNDEF symbols high, so that
> we can trim the end of the .gnu.hash section), or we can store some addend
> into the second word of .gnu.hash (before the buckets array).

	The addend sounds nice.

	Of course - it depends on how hard to calculate the hash function is:
we could of course use a rather stronger / slower hash: and simply store
the value for all symbols [ regardless of def/undef etc. ] in the .chain
- and instead of calculating, look them up. [ Of course to calculate the
string hash we have to take a .dynstr cache miss in the 1st place so -
perhaps it's worthwhile ? ].

	Of course, if we do that, then it does in fact make sense to have the
-Bdirect indexes next to the undef symbols, so - OTOH - perhaps that
sort of thing does belong in another section;

> ELF requires that the STB_LOCAL symbols are at the beginning, not end of
> the .dynsym section.

	Yep - I fell over that :-)

>   The UNDEF symbols can of course be anywhere (after
> the STB_LOCAL symbols).  On most of the arches there are only a few
> STB_LOCAL symbols (< ~20), but on e.g. ia64 there are really many
> (e.g. ia64 glibc has ~ 420 such symbols, all caused by relocations).
> So, perhaps using the addend would be best.

	That's interesting; I have an architecturally blinkered view.

> I will rework the patch now.

	Thanks,

		Michael.	

-- 
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] DT_GNU_HASH: reducing working set ... (take 2)
  2006-07-03 18:18         ` Michael Meeks
@ 2006-07-03 21:05           ` Jakub Jelinek
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Jelinek @ 2006-07-03 21:05 UTC (permalink / raw)
  To: binutils; +Cc: libc-alpha, Ulrich Drepper, Michael Meeks

On Mon, Jul 03, 2006 at 07:21:22PM +0100, Michael Meeks wrote:
> > I will rework the patch now.

Here is the updated patch, Ulrich has the corresponding glibc bits and it
passed all testing we have done so far on it.

There is still room for improvement in the chain reordering to minimize
number of chains crossing cacheline boundaries, but that optimization
doesn't change anything in the section format, so I think it can be left
for later.

Ok for trunk?

2006-07-03  Jakub Jelinek  <jakub@redhat.com>

include/
	* bfdlink.h (struct bfd_link_info): Add emit_hash and
	emit_gnu_hash bitfields.
include/elf/
	* common.h (SHT_GNU_HASH, DT_GNU_HASH): Define.
ld/
	* scripttempl/elf.sc: Add .gnu.hash section.
	* emultempl/elf32.em (OPTION_HASH_STYLE): Define.
	(gld${EMULATION_NAME}_add_options): Register --hash-style option.
	(gld${EMULATION_NAME}_handle_option): Handle it.
	(gld${EMULATION_NAME}_list_options): Document it.
	* ldmain.c (main): Initialize emit_hash and emit_gnu_hash.
	* ld.texinfo: Document --hash-style option.
bfd/
	* elf.c (_bfd_elf_print_private_bfd_data): Handle DT_GNU_HASH.
	(bfd_section_from_shdr, elf_fake_sections, assign_section_numbers):
	Handle SHT_GNU_HASH.
	(special_sections_g): Include .gnu.hash section.
	(bfd_elf_gnu_hash): New function.
	* elf-bfd.h (bfd_elf_gnu_hash): New prototype.
	* elflink.c (_bfd_elf_link_create_dynamic_sections): Create .hash
	only if info->emit_hash, create .gnu.hash section if
	info->emit_gnu_hash.
	(struct collect_gnu_hash_codes): New type.
	(elf_collect_gnu_hash_codes, elf_renumber_gnu_hash_syms): New
	functions.
	(compute_bucket_count): Don't compute HASHCODES array, instead add
	that and NSYMS as arguments.  Use bed->s->sizeof_hash_entry
	instead of bed->s->arch_size / 8.  Fix .hash size estimation.
	When not optimizing, use the number of hashed symbols rather than
	dynsymcount.
	(bfd_elf_size_dynamic_sections): Only add DT_HASH if info->emit_hash,
	and ADD DT_GNU_HASH if info->emit_gnu_hash.
	(bfd_elf_size_dynsym_hash_dynstr): Size .hash only if info->emit_hash,
	adjust compute_bucket_count caller.  Create and populate .gnu.hash
	section if info->emit_gnu_hash.
	(elf_link_output_extsym): Only populate .hash section if
	finfo->hash_sec != NULL.
	(bfd_elf_final_link): Adjust assertion.  Handle DT_GNU_HASH.
binutils/
	* readelf.c (get_dynamic_type): Handle DT_GNU_HASH.
	(get_section_type_name): Handle SHT_GNU_HASH.
	(dynamic_info_DT_GNU_HASH): New variable.
	(process_dynamic_section): Handle DT_GNU_HASH.
	(process_symbol_table): Print also DT_GNU_HASH histogram.

--- ld/scripttempl/elf.sc.jj	2006-01-01 01:02:16.000000000 +0100
+++ ld/scripttempl/elf.sc	2006-06-22 11:11:53.000000000 +0200
@@ -260,6 +260,7 @@ SECTIONS
   ${INITIAL_READONLY_SECTIONS}
   ${TEXT_DYNAMIC+${DYNAMIC}}
   .hash         ${RELOCATING-0} : { *(.hash) }
+  .gnu.hash     ${RELOCATING-0} : { *(.gnu.hash) }
   .dynsym       ${RELOCATING-0} : { *(.dynsym) }
   .dynstr       ${RELOCATING-0} : { *(.dynstr) }
   .gnu.version  ${RELOCATING-0} : { *(.gnu.version) }
--- ld/ldmain.c.jj	2006-06-01 15:50:33.000000000 +0200
+++ ld/ldmain.c	2006-06-22 11:21:11.000000000 +0200
@@ -304,6 +304,8 @@ main (int argc, char **argv)
   link_info.create_object_symbols_section = NULL;
   link_info.gc_sym_list = NULL;
   link_info.base_file = NULL;
+  link_info.emit_hash = TRUE;
+  link_info.emit_gnu_hash = FALSE;
   /* SVR4 linkers seem to set DT_INIT and DT_FINI based on magic _init
      and _fini symbols.  We are compatible.  */
   link_info.init_function = "_init";
--- ld/ld.texinfo.jj	2006-06-15 14:31:06.000000000 +0200
+++ ld/ld.texinfo	2006-06-22 14:03:21.000000000 +0200
@@ -1883,6 +1883,14 @@ time it takes the linker to perform its 
 increasing the linker's memory requirements.  Similarly reducing this
 value can reduce the memory requirements at the expense of speed.
 
+@kindex --hash-style=@var{style}
+@item --hash-style=@var{style}
+Set the type of linker's hash table(s).  @var{style} can be either
+@code{sysv} for classic ELF @code{.hash} section, @code{gnu} for
+new style GNU @code{.gnu.hash} section or @code{both} for both
+the classic ELF @code{.hash} and new style GNU @code{.gnu.hash}
+hash tables.  The default is @code{sysv}.
+
 @kindex --reduce-memory-overheads
 @item --reduce-memory-overheads
 This option reduces memory requirements at ld runtime, at the expense of
--- ld/emultempl/elf32.em.jj	2006-06-20 18:34:24.000000000 +0200
+++ ld/emultempl/elf32.em	2006-06-22 14:39:25.000000000 +0200
@@ -1719,6 +1719,7 @@ cat >>e${EMULATION_NAME}.c <<EOF
 #define OPTION_GROUP			(OPTION_ENABLE_NEW_DTAGS + 1)
 #define OPTION_EH_FRAME_HDR		(OPTION_GROUP + 1)
 #define OPTION_EXCLUDE_LIBS		(OPTION_EH_FRAME_HDR + 1)
+#define OPTION_HASH_STYLE		(OPTION_EXCLUDE_LIBS + 1)
 
 static void
 gld${EMULATION_NAME}_add_options
@@ -1735,6 +1736,7 @@ cat >>e${EMULATION_NAME}.c <<EOF
     {"enable-new-dtags", no_argument, NULL, OPTION_ENABLE_NEW_DTAGS},
     {"eh-frame-hdr", no_argument, NULL, OPTION_EH_FRAME_HDR},
     {"exclude-libs", required_argument, NULL, OPTION_EXCLUDE_LIBS},
+    {"hash-style", required_argument, NULL, OPTION_HASH_STYLE},
     {"Bgroup", no_argument, NULL, OPTION_GROUP},
 EOF
 fi
@@ -1791,6 +1793,22 @@ cat >>e${EMULATION_NAME}.c <<EOF
       add_excluded_libs (optarg);
       break;
 
+    case OPTION_HASH_STYLE:
+      link_info.emit_hash = FALSE;
+      link_info.emit_gnu_hash = FALSE;
+      if (strcmp (optarg, "sysv") == 0)
+	link_info.emit_hash = TRUE;
+      else if (strcmp (optarg, "gnu") == 0)
+	link_info.emit_gnu_hash = TRUE;
+      else if (strcmp (optarg, "both") == 0)
+	{
+	  link_info.emit_hash = TRUE;
+	  link_info.emit_gnu_hash = TRUE;
+	}
+      else
+	einfo (_("%P%F: invalid hash style \`%s'\n"), optarg);
+      break;
+
     case 'z':
       if (strcmp (optarg, "initfirst") == 0)
 	link_info.flags_1 |= (bfd_vma) DF_1_INITFIRST;
@@ -1894,6 +1912,7 @@ cat >>e${EMULATION_NAME}.c <<EOF
   fprintf (file, _("  --disable-new-dtags\tDisable new dynamic tags\n"));
   fprintf (file, _("  --enable-new-dtags\tEnable new dynamic tags\n"));
   fprintf (file, _("  --eh-frame-hdr\tCreate .eh_frame_hdr section\n"));
+  fprintf (file, _("  --hash-style=STYLE\tSet hash style to sysv, gnu or both\n"));
   fprintf (file, _("  -z combreloc\t\tMerge dynamic relocs into one section and sort\n"));
   fprintf (file, _("  -z defs\t\tReport unresolved symbols in object files.\n"));
   fprintf (file, _("  -z execstack\t\tMark executable as requiring executable stack\n"));
--- bfd/elf-bfd.h.jj	2006-06-20 18:34:24.000000000 +0200
+++ bfd/elf-bfd.h	2006-06-26 16:17:53.000000000 +0200
@@ -1481,6 +1481,8 @@ extern bfd_vma _bfd_elf_section_offset
 
 extern unsigned long bfd_elf_hash
   (const char *);
+extern unsigned long bfd_elf_gnu_hash
+  (const char *);
 
 extern bfd_reloc_status_type bfd_elf_generic_reloc
   (bfd *, arelent *, asymbol *, void *, asection *, bfd *, char **);
--- bfd/elf.c.jj	2006-06-20 18:34:24.000000000 +0200
+++ bfd/elf.c	2006-06-26 16:17:28.000000000 +0200
@@ -206,6 +206,21 @@ bfd_elf_hash (const char *namearg)
   return h & 0xffffffff;
 }
 
+/* DT_GNU_HASH hash function.  Do not change this function; you will
+   cause invalid hash tables to be generated.  */
+
+unsigned long
+bfd_elf_gnu_hash (const char *namearg)
+{
+  const unsigned char *name = (const unsigned char *) namearg;
+  unsigned long h = 5381;
+  unsigned char ch;
+
+  while ((ch = *name++) != '\0')
+    h = (h << 5) + h + ch;
+  return h & 0xffffffff;
+}
+
 bfd_boolean
 bfd_elf_mkobject (bfd *abfd)
 {
@@ -1239,6 +1254,7 @@ _bfd_elf_print_private_bfd_data (bfd *ab
 	    case DT_AUXILIARY: name = "AUXILIARY"; stringp = TRUE; break;
 	    case DT_USED: name = "USED"; break;
 	    case DT_FILTER: name = "FILTER"; stringp = TRUE; break;
+	    case DT_GNU_HASH: name = "GNU_HASH"; break;
 	    }
 
 	  fprintf (f, "  %-11s ", name);
@@ -1823,6 +1839,7 @@ bfd_section_from_shdr (bfd *abfd, unsign
     case SHT_FINI_ARRAY:	/* .fini_array section.  */
     case SHT_PREINIT_ARRAY:	/* .preinit_array section.  */
     case SHT_GNU_LIBLIST:	/* .gnu.liblist section.  */
+    case SHT_GNU_HASH:		/* .gnu.hash section.  */
       return _bfd_elf_make_section_from_shdr (abfd, hdr, name, shindex);
 
     case SHT_DYNAMIC:	/* Dynamic linking information.  */
@@ -2295,6 +2312,7 @@ static const struct bfd_elf_special_sect
   { ".gnu.version_r", 14,  0, SHT_GNU_verneed, 0 },
   { ".gnu.liblist",   12,  0, SHT_GNU_LIBLIST, SHF_ALLOC },
   { ".gnu.conflict",  13,  0, SHT_RELA,     SHF_ALLOC },
+  { ".gnu.hash",       9,  0, SHT_GNU_HASH, SHF_ALLOC },
   { NULL,              0,  0, 0,            0 }
 };
 
@@ -2811,6 +2829,10 @@ elf_fake_sections (bfd *abfd, asection *
     case SHT_GROUP:
       this_hdr->sh_entsize = 4;
       break;
+
+    case SHT_GNU_HASH:
+      this_hdr->sh_entsize = 4;
+      break;
     }
 
   if ((asect->flags & SEC_ALLOC) != 0)
@@ -3256,6 +3278,7 @@ assign_section_numbers (bfd *abfd, struc
 	  break;
 
 	case SHT_HASH:
+	case SHT_GNU_HASH:
 	case SHT_GNU_versym:
 	  /* sh_link is the section header index of the symbol table
 	     this hash table or version table is for.  */
--- bfd/elflink.c.jj	2006-06-20 18:34:53.000000000 +0200
+++ bfd/elflink.c	2006-07-03 18:26:47.000000000 +0200
@@ -240,12 +240,24 @@ _bfd_elf_link_create_dynamic_sections (b
   if (!_bfd_elf_define_linkage_sym (abfd, info, s, "_DYNAMIC"))
     return FALSE;
 
-  s = bfd_make_section_with_flags (abfd, ".hash",
-				   flags | SEC_READONLY);
-  if (s == NULL
-      || ! bfd_set_section_alignment (abfd, s, bed->s->log_file_align))
-    return FALSE;
-  elf_section_data (s)->this_hdr.sh_entsize = bed->s->sizeof_hash_entry;
+  if (info->emit_hash)
+    {
+      s = bfd_make_section_with_flags (abfd, ".hash", flags | SEC_READONLY);
+      if (s == NULL
+	  || ! bfd_set_section_alignment (abfd, s, bed->s->log_file_align))
+	return FALSE;
+      elf_section_data (s)->this_hdr.sh_entsize = bed->s->sizeof_hash_entry;
+    }
+
+  if (info->emit_gnu_hash)
+    {
+      s = bfd_make_section_with_flags (abfd, ".gnu.hash",
+				       flags | SEC_READONLY);
+      if (s == NULL
+	  || ! bfd_set_section_alignment (abfd, s, bed->s->log_file_align))
+	return FALSE;
+      elf_section_data (s)->this_hdr.sh_entsize = 4;
+    }
 
   /* Let the backend create the rest of the sections.  This lets the
      backend set the right flags.  The backend will normally create
@@ -4811,6 +4823,118 @@ elf_collect_hash_codes (struct elf_link_
   return TRUE;
 }
 
+struct collect_gnu_hash_codes
+{
+  bfd *output_bfd;
+  unsigned long int nsyms;
+  unsigned long int *hashcodes;
+  unsigned long int *hashval;
+  unsigned long int *indx;
+  unsigned long int *counts;
+  bfd_byte *contents;
+  long int min_dynindx;
+  unsigned long int bucketcount;
+  unsigned long int symindx;
+  long int local_indx;
+};
+
+/* This function will be called though elf_link_hash_traverse to store
+   all hash value of the exported symbols in an array.  */
+
+static bfd_boolean
+elf_collect_gnu_hash_codes (struct elf_link_hash_entry *h, void *data)
+{
+  struct collect_gnu_hash_codes *s = data;
+  const char *name;
+  char *p;
+  unsigned long ha;
+  char *alc = NULL;
+
+  if (h->root.type == bfd_link_hash_warning)
+    h = (struct elf_link_hash_entry *) h->root.u.i.link;
+
+  /* Ignore indirect symbols.  These are added by the versioning code.  */
+  if (h->dynindx == -1)
+    return TRUE;
+
+  /* Ignore also local symbols and undefined symbols.  */
+  if (h->forced_local
+      || h->root.type == bfd_link_hash_undefined
+      || h->root.type == bfd_link_hash_undefweak
+      || ((h->root.type == bfd_link_hash_defined
+	   || h->root.type == bfd_link_hash_defweak)
+	  && h->root.u.def.section->output_section == NULL))
+    return TRUE;
+
+  name = h->root.root.string;
+  p = strchr (name, ELF_VER_CHR);
+  if (p != NULL)
+    {
+      alc = bfd_malloc (p - name + 1);
+      memcpy (alc, name, p - name);
+      alc[p - name] = '\0';
+      name = alc;
+    }
+
+  /* Compute the hash value.  */
+  ha = bfd_elf_gnu_hash (name);
+
+  /* Store the found hash value in the array for compute_bucket_count,
+     and also for .dynsym reordering purposes.  */
+  s->hashcodes[s->nsyms] = ha;
+  s->hashval[h->dynindx] = ha;
+  ++s->nsyms;
+  if (s->min_dynindx < 0 || s->min_dynindx > h->dynindx)
+    s->min_dynindx = h->dynindx;
+
+  if (alc != NULL)
+    free (alc);
+
+  return TRUE;
+}
+
+/* This function will be called though elf_link_hash_traverse to do
+   final dynaminc symbol renumbering.  */
+
+static bfd_boolean
+elf_renumber_gnu_hash_syms (struct elf_link_hash_entry *h, void *data)
+{
+  struct collect_gnu_hash_codes *s = data;
+  unsigned long int bucket;
+  unsigned long int val;
+
+  if (h->root.type == bfd_link_hash_warning)
+    h = (struct elf_link_hash_entry *) h->root.u.i.link;
+
+  /* Ignore indirect symbols.  */
+  if (h->dynindx == -1)
+    return TRUE;
+
+  /* Ignore also local symbols and undefined symbols.  */
+  if (h->forced_local
+      || h->root.type == bfd_link_hash_undefined
+      || h->root.type == bfd_link_hash_undefweak
+      || ((h->root.type == bfd_link_hash_defined
+	   || h->root.type == bfd_link_hash_defweak)
+	  && h->root.u.def.section->output_section == NULL))
+    {
+      if (h->dynindx >= s->min_dynindx)
+	h->dynindx = s->local_indx++;
+      return TRUE;
+    }
+
+  bucket = s->hashval[h->dynindx] % s->bucketcount;
+  val = s->hashval[h->dynindx] & ~(unsigned long int) 1;
+  if (s->counts[bucket] == 1)
+    /* Last element terminates the chain.  */
+    val |= 1;
+  bfd_put_32 (s->output_bfd, val,
+	      s->contents + (s->indx[bucket] - s->symindx) * 4);
+  --s->counts[bucket];
+  h->dynindx = s->indx[bucket]++;
+  return TRUE;
+}
+
 /* Array used to determine the number of hash table buckets to use
    based on the number of symbols there are.  If there are fewer than
    3 symbols we use 1 bucket, fewer than 17 symbols we use 3 buckets,
@@ -4832,42 +4956,26 @@ static const size_t elf_buckets[] =
    Therefore the result is always a good payoff between few collisions
    (= short chain lengths) and table size.  */
 static size_t
-compute_bucket_count (struct bfd_link_info *info)
+compute_bucket_count (struct bfd_link_info *info, unsigned long int *hashcodes,
+		      unsigned long int nsyms)
 {
   size_t dynsymcount = elf_hash_table (info)->dynsymcount;
   size_t best_size = 0;
-  unsigned long int *hashcodes;
-  unsigned long int *hashcodesp;
   unsigned long int i;
   bfd_size_type amt;
 
-  /* Compute the hash values for all exported symbols.  At the same
-     time store the values in an array so that we could use them for
-     optimizations.  */
-  amt = dynsymcount;
-  amt *= sizeof (unsigned long int);
-  hashcodes = bfd_malloc (amt);
-  if (hashcodes == NULL)
-    return 0;
-  hashcodesp = hashcodes;
-
-  /* Put all hash values in HASHCODES.  */
-  elf_link_hash_traverse (elf_hash_table (info),
-			  elf_collect_hash_codes, &hashcodesp);
-
   /* We have a problem here.  The following code to optimize the table
      size requires an integer type with more the 32 bits.  If
      BFD_HOST_U_64_BIT is set we know about such a type.  */
 #ifdef BFD_HOST_U_64_BIT
   if (info->optimize)
     {
-      unsigned long int nsyms = hashcodesp - hashcodes;
       size_t minsize;
       size_t maxsize;
       BFD_HOST_U_64_BIT best_chlen = ~((BFD_HOST_U_64_BIT) 0);
-      unsigned long int *counts ;
       bfd *dynobj = elf_hash_table (info)->dynobj;
       const struct elf_backend_data *bed = get_elf_backend_data (dynobj);
+      unsigned long int *counts;
 
       /* Possible optimization parameters: if we have NSYMS symbols we say
 	 that the hashing table must at least have NSYMS/4 and at most
@@ -4883,10 +4991,7 @@ compute_bucket_count (struct bfd_link_in
       amt *= sizeof (unsigned long int);
       counts = bfd_malloc (amt);
       if (counts == NULL)
-	{
-	  free (hashcodes);
-	  return 0;
-	}
+	return 0;
 
       /* Compute the "optimal" size for the hash table.  The criteria is a
 	 minimal chain length.  The minor criteria is (of course) the size
@@ -4913,9 +5018,9 @@ compute_bucket_count (struct bfd_link_in
 #  define BFD_TARGET_PAGESIZE	(4096)
 # endif
 
-	  /* We in any case need 2 + NSYMS entries for the size values and
-	     the chains.  */
-	  max = (2 + nsyms) * (bed->s->arch_size / 8);
+	  /* We in any case need 2 + DYNSYMCOUNT entries for the size values
+	     and the chains.  */
+	  max = (2 + dynsymcount) * bed->s->sizeof_hash_entry;
 
 # if 1
 	  /* Variant 1: optimize for short chains.  We add the squares
@@ -4925,7 +5030,7 @@ compute_bucket_count (struct bfd_link_in
 	    max += counts[j] * counts[j];
 
 	  /* This adds penalties for the overall size of the table.  */
-	  fact = i / (BFD_TARGET_PAGESIZE / (bed->s->arch_size / 8)) + 1;
+	  fact = i / (BFD_TARGET_PAGESIZE / bed->s->sizeof_hash_entry) + 1;
 	  max *= fact * fact;
 # else
 	  /* Variant 2: Optimize a lot more for small table.  Here we
@@ -4936,7 +5041,7 @@ compute_bucket_count (struct bfd_link_in
 
 	  /* The overall size of the table is considered, but not as
 	     strong as in variant 1, where it is squared.  */
-	  fact = i / (BFD_TARGET_PAGESIZE / (bed->s->arch_size / 8)) + 1;
+	  fact = i / (BFD_TARGET_PAGESIZE / bed->s->sizeof_hash_entry) + 1;
 	  max *= fact;
 # endif
 
@@ -4959,14 +5064,11 @@ compute_bucket_count (struct bfd_link_in
       for (i = 0; elf_buckets[i] != 0; i++)
 	{
 	  best_size = elf_buckets[i];
-	  if (dynsymcount < elf_buckets[i + 1])
+	  if (nsyms < elf_buckets[i + 1])
 	    break;
 	}
     }
 
-  /* Free the arrays we needed.  */
-  free (hashcodes);
-
   return best_size;
 }
 
@@ -5324,7 +5426,10 @@ bfd_elf_size_dynamic_sections (bfd *outp
 	  bfd_size_type strsize;
 
 	  strsize = _bfd_elf_strtab_size (elf_hash_table (info)->dynstr);
-	  if (!_bfd_elf_add_dynamic_entry (info, DT_HASH, 0)
+	  if ((info->emit_hash
+	       && !_bfd_elf_add_dynamic_entry (info, DT_HASH, 0))
+	      || (info->emit_gnu_hash
+		  && !_bfd_elf_add_dynamic_entry (info, DT_GNU_HASH, 0))
 	      || !_bfd_elf_add_dynamic_entry (info, DT_STRTAB, 0)
 	      || !_bfd_elf_add_dynamic_entry (info, DT_SYMTAB, 0)
 	      || !_bfd_elf_add_dynamic_entry (info, DT_STRSZ, strsize)
@@ -5726,8 +5831,6 @@ bfd_elf_size_dynsym_hash_dynstr (bfd *ou
       asection *s;
       bfd_size_type dynsymcount;
       unsigned long section_sym_count;
-      size_t bucketcount = 0;
-      size_t hash_entry_size;
       unsigned int dtagcount;
 
       dynobj = elf_hash_table (info)->dynobj;
@@ -5778,23 +5881,150 @@ bfd_elf_size_dynsym_hash_dynstr (bfd *ou
 	  memset (s->contents, 0, section_sym_count * bed->s->sizeof_sym);
 	}
 
+      elf_hash_table (info)->bucketcount = 0;
+
       /* Compute the size of the hashing table.  As a side effect this
 	 computes the hash values for all the names we export.  */
-      bucketcount = compute_bucket_count (info);
+      if (info->emit_hash)
+	{
+	  unsigned long int *hashcodes;
+	  unsigned long int *hashcodesp;
+	  bfd_size_type amt;
+	  unsigned long int nsyms;
+	  size_t bucketcount;
+	  size_t hash_entry_size;
+
+	  /* Compute the hash values for all exported symbols.  At the same
+	     time store the values in an array so that we could use them for
+	     optimizations.  */
+	  amt = dynsymcount * sizeof (unsigned long int);
+	  hashcodes = bfd_malloc (amt);
+	  if (hashcodes == NULL)
+	    return FALSE;
+	  hashcodesp = hashcodes;
 
-      s = bfd_get_section_by_name (dynobj, ".hash");
-      BFD_ASSERT (s != NULL);
-      hash_entry_size = elf_section_data (s)->this_hdr.sh_entsize;
-      s->size = ((2 + bucketcount + dynsymcount) * hash_entry_size);
-      s->contents = bfd_zalloc (output_bfd, s->size);
-      if (s->contents == NULL)
-	return FALSE;
+	  /* Put all hash values in HASHCODES.  */
+	  elf_link_hash_traverse (elf_hash_table (info),
+				  elf_collect_hash_codes, &hashcodesp);
+
+	  nsyms = hashcodesp - hashcodes;
+	  bucketcount
+	    = compute_bucket_count (info, hashcodes, nsyms);
+	  free (hashcodes);
+
+	  if (bucketcount == 0)
+	    return FALSE;
+
+	  elf_hash_table (info)->bucketcount = bucketcount;
+
+	  s = bfd_get_section_by_name (dynobj, ".hash");
+	  BFD_ASSERT (s != NULL);
+	  hash_entry_size = elf_section_data (s)->this_hdr.sh_entsize;
+	  s->size = ((2 + bucketcount + dynsymcount) * hash_entry_size);
+	  s->contents = bfd_zalloc (output_bfd, s->size);
+	  if (s->contents == NULL)
+	    return FALSE;
 
-      bfd_put (8 * hash_entry_size, output_bfd, bucketcount, s->contents);
-      bfd_put (8 * hash_entry_size, output_bfd, dynsymcount,
-	       s->contents + hash_entry_size);
+	  bfd_put (8 * hash_entry_size, output_bfd, bucketcount, s->contents);
+	  bfd_put (8 * hash_entry_size, output_bfd, dynsymcount,
+		   s->contents + hash_entry_size);
+	}
+
+      if (info->emit_gnu_hash)
+	{
+	  size_t i, cnt;
+	  unsigned char *contents;
+	  struct collect_gnu_hash_codes cinfo;
+	  bfd_size_type amt;
+	  size_t bucketcount;
+
+	  memset (&cinfo, 0, sizeof (cinfo));
 
-      elf_hash_table (info)->bucketcount = bucketcount;
+	  /* Compute the hash values for all exported symbols.  At the same
+	     time store the values in an array so that we could use them for
+	     optimizations.  */
+	  amt = dynsymcount * 2 * sizeof (unsigned long int);
+	  cinfo.hashcodes = bfd_malloc (amt);
+	  if (cinfo.hashcodes == NULL)
+	    return FALSE;
+
+	  cinfo.hashval = cinfo.hashcodes + dynsymcount;
+	  cinfo.min_dynindx = -1;
+	  cinfo.output_bfd = output_bfd;
+
+	  /* Put all hash values in HASHCODES.  */
+	  elf_link_hash_traverse (elf_hash_table (info),
+				  elf_collect_gnu_hash_codes, &cinfo);
+
+	  bucketcount
+	    = compute_bucket_count (info, cinfo.hashcodes, cinfo.nsyms);
+
+	  if (bucketcount == 0)
+	    {
+	      free (cinfo.hashcodes);
+	      return FALSE;
+	    }
+
+	  amt = bucketcount * sizeof (unsigned long int) * 2;
+	  cinfo.counts = bfd_malloc (amt);
+	  if (cinfo.counts == NULL)
+	    {
+	      free (cinfo.hashcodes);
+	      return FALSE;
+	    }
+
+	  /* Determine how often each hash bucket is used.  */
+	  memset (cinfo.counts, 0, bucketcount * sizeof (cinfo.counts[0]));
+	  for (i = 0; i < cinfo.nsyms; ++i)
+	    ++cinfo.counts[cinfo.hashcodes[i] % bucketcount];
+
+	  s = bfd_get_section_by_name (dynobj, ".gnu.hash");
+	  BFD_ASSERT (s != NULL);
+	  cinfo.indx = cinfo.counts + bucketcount;
+	  cinfo.symindx = dynsymcount - cinfo.nsyms;
+	  for (i = 0, cnt = cinfo.symindx; i < bucketcount; ++i)
+	    if (cinfo.counts[i] != 0)
+	      {
+		cinfo.indx[i] = cnt;
+		cnt += cinfo.counts[i];
+	      }
+	  BFD_ASSERT (cnt == dynsymcount);
+	  cinfo.bucketcount = bucketcount;
+	  cinfo.local_indx = cinfo.min_dynindx;
+
+	  s->size = (2 + bucketcount + cinfo.nsyms) * 4;
+	  contents = bfd_zalloc (output_bfd, s->size);
+	  if (contents == NULL)
+	    {
+	      free (cinfo.counts);
+	      free (cinfo.hashcodes);
+	      return FALSE;
+	    }
+
+	  s->contents = contents;
+	  bfd_put_32 (output_bfd, bucketcount, contents);
+	  bfd_put_32 (output_bfd, cinfo.symindx, contents + 4);
+	  contents += 8;
+
+	  for (i = 0; i < bucketcount; ++i)
+	    {
+	      if (cinfo.counts[i] == 0)
+		bfd_put_32 (output_bfd, ~0, contents);
+	      else
+		bfd_put_32 (output_bfd, cinfo.indx[i] - cinfo.symindx,
+			    contents);
+	      contents += 4;
+	    }
+
+	  cinfo.contents = contents;
+
+	  /* Renumber dynamic symbols, populate .gnu.hash section.  */
+	  elf_link_hash_traverse (elf_hash_table (info),
+				  elf_renumber_gnu_hash_syms, &cinfo);
+
+	  free (cinfo.counts);
+	  free (cinfo.hashcodes);
+	}
 
       s = bfd_get_section_by_name (dynobj, ".dynstr");
       BFD_ASSERT (s != NULL);
@@ -6663,9 +6893,6 @@ elf_link_output_extsym (struct elf_link_
     {
       size_t bucketcount;
       size_t bucket;
-      size_t hash_entry_size;
-      bfd_byte *bucketpos;
-      bfd_vma chain;
       bfd_byte *esym;
 
       sym.st_name = h->dynstr_index;
@@ -6679,15 +6906,23 @@ elf_link_output_extsym (struct elf_link_
 
       bucketcount = elf_hash_table (finfo->info)->bucketcount;
       bucket = h->u.elf_hash_value % bucketcount;
-      hash_entry_size
-	= elf_section_data (finfo->hash_sec)->this_hdr.sh_entsize;
-      bucketpos = ((bfd_byte *) finfo->hash_sec->contents
-		   + (bucket + 2) * hash_entry_size);
-      chain = bfd_get (8 * hash_entry_size, finfo->output_bfd, bucketpos);
-      bfd_put (8 * hash_entry_size, finfo->output_bfd, h->dynindx, bucketpos);
-      bfd_put (8 * hash_entry_size, finfo->output_bfd, chain,
-	       ((bfd_byte *) finfo->hash_sec->contents
-		+ (bucketcount + 2 + h->dynindx) * hash_entry_size));
+
+      if (finfo->hash_sec != NULL)
+	{
+	  size_t hash_entry_size;
+	  bfd_byte *bucketpos;
+	  bfd_vma chain;
+
+	  hash_entry_size
+	    = elf_section_data (finfo->hash_sec)->this_hdr.sh_entsize;
+	  bucketpos = ((bfd_byte *) finfo->hash_sec->contents
+		       + (bucket + 2) * hash_entry_size);
+	  chain = bfd_get (8 * hash_entry_size, finfo->output_bfd, bucketpos);
+	  bfd_put (8 * hash_entry_size, finfo->output_bfd, h->dynindx, bucketpos);
+	  bfd_put (8 * hash_entry_size, finfo->output_bfd, chain,
+		   ((bfd_byte *) finfo->hash_sec->contents
+		    + (bucketcount + 2 + h->dynindx) * hash_entry_size));
+	}
 
       if (finfo->symver_sec != NULL && finfo->symver_sec->contents != NULL)
 	{
@@ -7861,7 +8096,7 @@ bfd_elf_final_link (bfd *abfd, struct bf
     {
       finfo.dynsym_sec = bfd_get_section_by_name (dynobj, ".dynsym");
       finfo.hash_sec = bfd_get_section_by_name (dynobj, ".hash");
-      BFD_ASSERT (finfo.dynsym_sec != NULL && finfo.hash_sec != NULL);
+      BFD_ASSERT (finfo.dynsym_sec != NULL);
       finfo.symver_sec = bfd_get_section_by_name (dynobj, ".gnu.version");
       /* Note that it is OK if symver_sec is NULL.  */
     }
@@ -8621,6 +8856,9 @@ bfd_elf_final_link (bfd *abfd, struct bf
 	    case DT_HASH:
 	      name = ".hash";
 	      goto get_vma;
+	    case DT_GNU_HASH:
+	      name = ".gnu.hash";
+	      goto get_vma;
 	    case DT_STRTAB:
 	      name = ".dynstr";
 	      goto get_vma;
--- include/elf/common.h.jj	2006-02-17 15:36:26.000000000 +0100
+++ include/elf/common.h	2006-06-22 10:43:21.000000000 +0200
@@ -338,6 +338,7 @@
 #define SHT_LOOS	0x60000000	/* First of OS specific semantics */
 #define SHT_HIOS	0x6fffffff	/* Last of OS specific semantics */
 
+#define SHT_GNU_HASH	0x6ffffff6	/* GNU style symbol hash table */
 #define SHT_GNU_LIBLIST	0x6ffffff7	/* List of prelink dependencies */
 
 /* The next three section types are defined by Solaris, and are named
@@ -577,6 +578,7 @@
 #define DT_VALRNGHI	0x6ffffdff
 
 #define DT_ADDRRNGLO	0x6ffffe00
+#define DT_GNU_HASH	0x6ffffef5
 #define DT_TLSDESC_PLT	0x6ffffef6
 #define DT_TLSDESC_GOT	0x6ffffef7
 #define DT_GNU_CONFLICT	0x6ffffef8
--- include/bfdlink.h.jj	2006-04-07 17:17:29.000000000 +0200
+++ include/bfdlink.h	2006-06-22 11:11:20.000000000 +0200
@@ -324,6 +324,12 @@ struct bfd_link_info
   /* TRUE if unreferenced sections should be removed.  */
   unsigned int gc_sections: 1;
 
+  /* TRUE if .hash section should be created.  */
+  unsigned int emit_hash: 1;
+
+  /* TRUE if .gnu.hash section should be created.  */
+  unsigned int emit_gnu_hash: 1;
+
   /* What to do with unresolved symbols in an object file.
      When producing executables the default is GENERATE_ERROR.
      When producing shared libraries the default is IGNORE.  The
--- binutils/readelf.c.jj	2006-05-30 16:13:54.000000000 +0200
+++ binutils/readelf.c	2006-07-03 19:30:54.000000000 +0200
@@ -135,6 +135,7 @@ static unsigned long dynamic_syminfo_off
 static unsigned int dynamic_syminfo_nent;
 static char program_interpreter[64];
 static bfd_vma dynamic_info[DT_JMPREL + 1];
+static bfd_vma dynamic_info_DT_GNU_HASH;
 static bfd_vma version_info[16];
 static Elf_Internal_Ehdr elf_header;
 static Elf_Internal_Shdr *section_headers;
@@ -1501,6 +1502,7 @@ get_dynamic_type (unsigned long type)
     case DT_GNU_CONFLICTSZ: return "GNU_CONFLICTSZ";
     case DT_GNU_LIBLIST: return "GNU_LIBLIST";
     case DT_GNU_LIBLISTSZ: return "GNU_LIBLISTSZ";
+    case DT_GNU_HASH:	return "GNU_HASH";
 
     default:
       if ((type >= DT_LOPROC) && (type <= DT_HIPROC))
@@ -2571,6 +2573,7 @@ get_section_type_name (unsigned int sh_t
     case SHT_INIT_ARRAY:	return "INIT_ARRAY";
     case SHT_FINI_ARRAY:	return "FINI_ARRAY";
     case SHT_PREINIT_ARRAY:	return "PREINIT_ARRAY";
+    case SHT_GNU_HASH:		return "GNU_HASH";
     case SHT_GROUP:		return "GROUP";
     case SHT_SYMTAB_SHNDX:	return "SYMTAB SECTION INDICIES";
     case SHT_GNU_verdef:	return "VERDEF";
@@ -6228,6 +6231,15 @@ process_dynamic_section (FILE *file)
 	    }
 	  break;
 
+	case DT_GNU_HASH:
+	  dynamic_info_DT_GNU_HASH = entry->d_un.d_val;
+	  if (do_dynamic)
+	    {
+	      print_vma (entry->d_un.d_val, PREFIX_HEX);
+	      putchar ('\n');
+	    }
+	  break;
+
 	default:
 	  if ((entry->d_tag >= DT_VERSYM) && (entry->d_tag <= DT_VERNEEDNUM))
 	    version_info[DT_VERSIONTAGIDX (entry->d_tag)] =
@@ -6903,6 +6915,9 @@ process_symbol_table (FILE *file)
   bfd_vma nchains = 0;
   bfd_vma *buckets = NULL;
   bfd_vma *chains = NULL;
+  bfd_vma ngnubuckets = 0;
+  bfd_vma *gnubuckets = NULL;
+  bfd_vma *gnuchains = NULL;
 
   if (! do_syms && !do_histogram)
     return 1;
@@ -7282,6 +7297,145 @@ process_symbol_table (FILE *file)
       free (chains);
     }
 
+  if (do_histogram && dynamic_info_DT_GNU_HASH)
+    {
+      unsigned char nb[8];
+      bfd_vma i, maxchain = 0xffffffff;
+      unsigned long *lengths;
+      unsigned long *counts;
+      unsigned long hn;
+      unsigned long maxlength = 0;
+      unsigned long nzero_counts = 0;
+      unsigned long nsyms = 0;
+
+      if (fseek (file,
+		 (archive_file_offset
+		  + offset_from_vma (file, dynamic_info_DT_GNU_HASH,
+				     sizeof nb)),
+		 SEEK_SET))
+	{
+	  error (_("Unable to seek to start of dynamic information"));
+	  return 0;
+	}
+
+      if (fread (nb, 8, 1, file) != 1)
+	{
+	  error (_("Failed to read in number of buckets\n"));
+	  return 0;
+	}
+
+      ngnubuckets = byte_get (nb, 4);
+
+      gnubuckets = get_dynamic_data (file, ngnubuckets, 4);
+
+      if (gnubuckets == NULL)
+	return 0;
+
+      for (i = 0; i < ngnubuckets; i++)
+	if (gnubuckets[i] != 0xffffffff
+	    && (maxchain == 0xffffffff || gnubuckets[i] > maxchain))
+	  maxchain = gnubuckets[i];
+
+      if (maxchain == 0xffffffff)
+	return 0;
+
+      if (fseek (file,
+		 (archive_file_offset
+		  + offset_from_vma (file,
+				     dynamic_info_DT_GNU_HASH
+				     + 4 * (2 + ngnubuckets + maxchain),
+				     sizeof nb)),
+		 SEEK_SET))
+	{
+	  error (_("Unable to seek to start of dynamic information"));
+	  return 0;
+	}
+
+      do
+	{
+	  if (fread (nb, 4, 1, file) != 1)
+	    {
+	      error (_("Failed to determine last chain length\n"));
+	      return 0;
+	    }
+
+	  if (maxchain + 1 == 0)
+	    return 0;
+
+	  ++maxchain;
+	}
+      while ((byte_get (nb, 4) & 1) == 0);
+
+      if (fseek (file,
+		 (archive_file_offset
+		  + offset_from_vma (file,
+				     dynamic_info_DT_GNU_HASH
+				     + 4 * (2 + ngnubuckets), sizeof nb)),
+		 SEEK_SET))
+	{
+	  error (_("Unable to seek to start of dynamic information"));
+	  return 0;
+	}
+
+      gnuchains = get_dynamic_data (file, maxchain, 4);
+
+      if (gnuchains == NULL)
+	return 0;
+
+      lengths = calloc (ngnubuckets, sizeof (*lengths));
+      if (lengths == NULL)
+	{
+	  error (_("Out of memory"));
+	  return 0;
+	}
+
+      printf (_("\nHistogram for `.gnu.hash' bucket list length (total of %lu buckets):\n"),
+	      (unsigned long) ngnubuckets);
+      printf (_(" Length  Number     %% of total  Coverage\n"));
+
+      for (hn = 0; hn < ngnubuckets; ++hn)
+	if (gnubuckets[hn] != 0xffffffff)
+	  {
+	    bfd_vma off, length = 1;
+
+	    for (off = gnubuckets[hn]; (gnuchains[off] & 1) == 0; ++off)
+	      ++length;
+	    lengths[hn] = length;
+	    if (length > maxlength)
+	      maxlength = length;
+	    nsyms += length;
+	  }
+
+      counts = calloc (maxlength + 1, sizeof (*counts));
+      if (counts == NULL)
+	{
+	  error (_("Out of memory"));
+	  return 0;
+	}
+
+      for (hn = 0; hn < ngnubuckets; ++hn)
+	++counts[lengths[hn]];
+
+      if (ngnubuckets > 0)
+	{
+	  unsigned long j;
+	  printf ("      0  %-10lu (%5.1f%%)\n",
+		  counts[0], (counts[0] * 100.0) / ngnubuckets);
+	  for (j = 1; j <= maxlength; ++j)
+	    {
+	      nzero_counts += counts[j] * j;
+	      printf ("%7lu  %-10lu (%5.1f%%)    %5.1f%%\n",
+		      j, counts[j], (counts[j] * 100.0) / ngnubuckets,
+		      (nzero_counts * 100.0) / nsyms);
+	    }
+	}
+
+      free (counts);
+      free (lengths);
+      free (gnubuckets);
+      free (gnuchains);
+    }
+
   return 1;
 }
 


	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-07-03 21:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-28 17:21 [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement Jakub Jelinek
2006-06-28 19:10 ` Paul Eggert
2006-06-28 20:25   ` Roland McGrath
2006-06-28 21:40   ` Jakub Jelinek
2006-06-28 21:46     ` Paul Eggert
2006-06-28 21:49       ` Roland McGrath
2006-06-29  0:35       ` Ulrich Drepper
2006-06-29 19:39 ` Michael Meeks
2006-06-29 21:52   ` Jakub Jelinek
2006-07-03 15:12     ` DT_GNU_HASH: reducing working set Michael Meeks
2006-07-03 15:59       ` Jakub Jelinek
2006-07-03 18:18         ` Michael Meeks
2006-07-03 21:05           ` [PATCH] DT_GNU_HASH: reducing working set ... (take 2) Jakub Jelinek
2006-06-30 23:55   ` DT_GNU_HASH: ~ 50% dynamic linking improvement Ulrich Drepper
2006-07-03  9:26   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).