Hi! The following patches introduce an optional ELF hash section replacement, optimized for speed and data cache accesses, which in our tests improves dynamic linking by about 50%. Prelinking of course eliminates the costs altogether but if prelinked apps use dlopen they benefit for this portion. This is incidently where some apps today incur high costs today. The initial design was done by Ulrich Drepper and was discussed with Michael Meeks a few months ago as well. But nothing came off of these discussions and the proposed approach is quite different. The binutils patch adds a new option to ld, --hash-style, which allows selection of which hash sections to emit. ld can either emit the old style SHT_HASH section, or the new style SHT_GNU_HASH, or both (and perhaps in the future there could be a mode in which it would emit both, but SHT_HASH for slow compatibility only (minimal number of buckets)). The .gnu.hash section uses always sh_entsize 4 (so doesn't repeat the historic mistakes on Alpha/s390x with .hash). The first word there is nbuckets like in .hash section, followed by nbuckets offsets into the chains area of the new section. If the offset is 0xffffffff, it means there are no defined symbol names with hash % nbuckets equal to the offset's position. Otherwise, offset N means the corresponding chain starts at offset (1 + nbuckets + N) * 4 into .gnu.hash section. The chain are does not mirror in size the symbol table anymore. Each chain starts with a symbol index word, followed by chain length word and then length times hash value of the corresponding symbol name. If DT_GNU_HASH is present, .dynsym must be sorted, so that all symbols with the same name hash value % nbuckets are grouped together. So, if some chain contains symindx0 3 hash0 hash1 hash2 then dynamic symbol symindx0 has name hash hash0, symindx0+1 has hash1 and symindx0+2 has hash2 and (hash0%nbuckets)==(hash1%nbuckets)==(hash2%nbuckets). The hash function used is static uint_fast32_t dl_new_hash (const char *s) { uint_fast32_t h = 5381; for (unsigned char c = *s; c != '\0'; c = *++s) h = h * 33 + c; return h & 0xffffffff; } (Dan Bernstein's string hash function posted eons ago on comp.lang.c.) For an unsuccessful lookup (the usual case), the dynamic linker has to read the buckets[hash % nbuckets] (one cache line) and then go through the chain if it is not 0xffffffff, comparing each hashN with hash and as the hashN values are 32-bit and the hashing function spreads the strings well, it is very likely that if the hash is equal, then we have found the symbol (and just need to verify .dynsym/.dynstr), if it is different from all hashN values in the chain, ld.so can go on with another shared library. Usually the whole chain will be in one cache line or at most 2 cache lines. We have tested a bunch of different hash functions on the set of ~ 530000 unique dynamic symbols in one Linux installation and Dan Bernstein's hash had the fewest collisions (only 29), with very short maximum common prefixes for the colliding symbols (just 3 chars) and is also very cheap to compute (h * 33 is (h << 5) + h), also both ld -O1 and non-optimizing ld hash sizing gave good results with that hash function. The standard SysV ELF hash function gave bad results, on the same set of symbols had 1726 collisions and some of the colliding symbols had very long common name prefixes. One of the reasons could be that SysV ELF hash function is 28 bit, while Bernstein's is 32 bit. Attached stats file shows the biggest benchmark we used (a proglet that attempts to do all dynamic linking OpenOffice.org 2.0.3 swriter.bin does), most of the libraries (except about 8 libs from the total of 146 libraries used by the testcase) were relinked with -Wl,--hash-style=both and glibc on top of the DT_GNU_HASH support patch had also a hack, where if LD_X=1 was in environment, it would pretend DT_GNU_HASH is not present to allow easier benchmarking. The LD_X support will not be in the final glibc patch. Ok for trunk? Jakub