On Tue, 2006-08-22 at 01:47 -0400, John Moser wrote: [...] > I'm still thinking this is probably a win > since we're unlikely to branch 8 times from a common prefix and even if > we do we're suddenly faced with a situation where we CAN assume we're > winning something*; Drepper are you around, I could use your input. So I had some math run. You have N string elements each sharing a common prefix of X characters. If you arrange these in a trie, you have N branches to follow after the prefix. Each direction tested in a branch is identical in operation to a character comparison. (I designed the storage format that way specifically.) Every string is searched for. The linker eventually goes through all symbols, taking every possible search path. If the strings are in a linked list, you compare the common prefixes 1+2 +...+N times, making X comparisons each time, totaling (1+2+...+N)*X or X*((N+1) * N)/2 character comparisons. If the strings are in a trie, you compare the common prefixes N times and compare each branch direction ((N+1)*N)/2 times, totaling N*X + ((N +1)*N)/2 branches. These simplify to (N*(1 + N)*X)/2 (linked list) and (N*(1 + N))/2 + N*X (patricia trie). A little help from #math (who helped me figure out the simplified versions) and we get: <+Cale> % Reduce[{x*((n + 1)*n)/2 > n*x + ((n + 1)*n)/2, x > 0, n > 0}] Cale: x > 1 && n > (1 + x)/(-1 + x) Now here's some interesting thoughts: X = 2; N > 3/1 (3) X = 3; N > 4/2 (2) X = 4; N > 5/3 (1.6) Once we've passed prefixes 4 characters long, any branch will be a gain because N will be 2 or more. Also the more directions we branch in, the more time we save; for example, take a prefix 3 characters long (let's say namespace std, which is probably like _Z22std anyway) and branch it 3, 5, and 10 ways. LL PT 3: 18 15 5: 45 30 10: 165 85 Or let's try a branch between 2 with prefixes 3, 5, 10, 20, and 50 long: LL PT 3: 9 9 5: 15 13 10: 30 23 20: 60 43 50: 150 103 In other words, this is harmless if prefixes are longer than 2 characters; and a gain if prefixes are longer than 3 characters OR the number of symbols in the bucket sharing that number of prefixes is greater than 2. This is EXTREMELY likely. Of course, this is still all in theory. But the numbers are pretty. > -- John Moser