OK it took me 2 hours to make this work, and right now I do not want to deal with figuring out the rest of it because I don't even have my compsci AAS and have NO algorithms experience so this is brute force learning and I am learning that debugging algorithms written by horrible, horrible first-year compsci students is IRRITATING. ........ frustration vented, let's continue. (yes this message is REALLY ugly, and confusing, sorry. My thoughts are unorganized and I don't have enough data for a good argument here YET.) A while ago I proposed replacing the symbol strings (what section is that, .dynstr?) with patricia tries instead of hash chains (which as I understand are just linked lists jammed in a hash bucket) because of Drepper's paper about DSOs and common prefixes. Anyway, the gist of the idea is that I'm assuming the current symbol string structure is as follows: [Hash Table]--->[Bucket]-->[Symbol foobar] |->[Symbol foobaz] |->[Symbol foogonk]... I'm looking at instead [Hash Table]--->[Bucket]-->[Symbol foo]-->[ba]-->[r] | |->[z] |->[gonk] Such that when we have common prefixes, at the point where the strings diverge, we simply have to follow a branch. I wrote a small test program (attached) that builds a patricia trie and then dumps it, giving an analysis of the maximum depth (branches taken) and the longest common prefix. Well, Drepper's right at least. libgtk shows 10 branches when I patricia trie up the symbols, and a maximum common prefix of 35 characters; where with libgtkmm I get 16 branches and the biggest common prefix it can find is 229 characters. Landing these in a hash bucket is wasteful; to beat it I'll need a branch that's less intensive than 3.5 character comparisons... or not. If we assume there's only 2 entries sharing the common prefix, that's great. Except, when we branch, we have AT LEAST two entries sharing THIS common prefix. If we branch AGAIN, then we have at least THREE entries sharing the first prefix, and at least TWO entries sharing the shorter prefix (at the first branch). So if we branch TEN TIMES, there's a prefix shared 10 times, a longer one shared 9 times, a longer one shared 8 times, ... a longer one shared twice, and then a unique string. I have no idea what the implications are though I can guess there are branches at namespace and class when dealing with C++ symbols. (the real theoretical minimum is not 35 / 10; it's (1+2+3+4+5+6+7+8+9 +35) / 10 == 80/10 == 8. It doesn't much matter) The number of branches is easily reduced by reducing the number of items in the hash bucket (if there's 5 items, you can only branch in 5 places and probably won't approach that) and using Patricia Tries in place of hash chains instead of replacing the whole hash table. This is probably a good idea, since the only space-efficient storage methods I can come up with would be O(n) for n=num_branches operations per branch (each operation is 1 character comparison*; on top of that is one additional long addition.) *(comparing two strings, each character comparison is a compare and a long addition to iterate in a loop) So far I do NOT have a way to store this such that the branches are guaranteed less intensive than 8 character comparisons; and don't have real numbers on what I need. I'm still thinking this is probably a win since we're unlikely to branch 8 times from a common prefix and even if we do we're suddenly faced with a situation where we CAN assume we're winning something*; Drepper are you around, I could use your input. *(every extra direction we can take at a branch represents one more comparison of however many characters long the prefix is; so if we branch 8 ways after a prefix of 2 characters, we're spending 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 to avoid (8 + 7 + 6 + 5 + 4 + 3 + 2 + 1) * 2... somebody verify this for me, I think I just did something awesome and that shouldn't be possible) For those who want to keep trying, my crappy analysis program is attached. I used the below command to divine some useful numbers: readelf -sW /usr/lib/libgtk.so | tail -n +4 | awk '{print $8}' | grep -v '^$' | ./ptree (In case you haven't noticed yet, I trust my intuitions to fill in where my technical knowledge is lacking; and then try to find out if anyone else has something solid for me. -- John Moser