From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8232 invoked by alias); 22 Aug 2006 06:44:15 -0000 Received: (qmail 8213 invoked by uid 22791); 22 Aug 2006 06:44:13 -0000 X-Spam-Check-By: sourceware.org Received: from wx-out-0506.google.com (HELO wx-out-0506.google.com) (66.249.82.238) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 22 Aug 2006 06:44:12 +0000 Received: by wx-out-0506.google.com with SMTP id s19so1774915wxc for ; Mon, 21 Aug 2006 23:44:10 -0700 (PDT) Received: by 10.70.29.14 with SMTP id c14mr11057379wxc; Mon, 21 Aug 2006 23:44:10 -0700 (PDT) Received: from ?192.168.1.51? ( [68.33.112.13]) by mx.gmail.com with ESMTP id i20sm9336634wxd.2006.08.21.23.44.09; Mon, 21 Aug 2006 23:44:10 -0700 (PDT) Subject: Re: Patricia Trie pseudo-test From: John Moser To: libc-alpha@sourceware.org Cc: binutils In-Reply-To: <1156225627.5373.83.camel@localhost> References: <1156225627.5373.83.camel@localhost> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-4QGb980KktWXzqFqwV6T" Date: Tue, 22 Aug 2006 11:45:00 -0000 Message-Id: <1156229049.5373.99.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.7.91 Mailing-List: contact binutils-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: binutils-owner@sourceware.org X-SW-Source: 2006-08/txt/msg00236.txt.bz2 --=-4QGb980KktWXzqFqwV6T Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Content-length: 2394 On Tue, 2006-08-22 at 01:47 -0400, John Moser wrote: [...] > I'm still thinking this is probably a win > since we're unlikely to branch 8 times from a common prefix and even if > we do we're suddenly faced with a situation where we CAN assume we're > winning something*; Drepper are you around, I could use your input. So I had some math run. You have N string elements each sharing a common prefix of X characters. If you arrange these in a trie, you have N branches to follow after the prefix. Each direction tested in a branch is identical in operation to a character comparison. (I designed the storage format that way specifically.) Every string is searched for. The linker eventually goes through all symbols, taking every possible search path. If the strings are in a linked list, you compare the common prefixes 1+2 +...+N times, making X comparisons each time, totaling (1+2+...+N)*X or X*((N+1) * N)/2 character comparisons. If the strings are in a trie, you compare the common prefixes N times and compare each branch direction ((N+1)*N)/2 times, totaling N*X + ((N +1)*N)/2 branches. These simplify to (N*(1 + N)*X)/2 (linked list) and (N*(1 + N))/2 + N*X (patricia trie). A little help from #math (who helped me figure out the simplified versions) and we get: <+Cale> % Reduce[{x*((n + 1)*n)/2 > n*x + ((n + 1)*n)/2, x > 0, n > 0}] Cale: x > 1 && n > (1 + x)/(-1 + x) Now here's some interesting thoughts: X =3D 2; N > 3/1 (3) X =3D 3; N > 4/2 (2) X =3D 4; N > 5/3 (1.6) Once we've passed prefixes 4 characters long, any branch will be a gain because N will be 2 or more. Also the more directions we branch in, the more time we save; for example, take a prefix 3 characters long (let's say namespace std, which is probably like _Z22std anyway) and branch it 3, 5, and 10 ways. LL PT 3: 18 15 5: 45 30 10: 165 85 Or let's try a branch between 2 with prefixes 3, 5, 10, 20, and 50 long: LL PT 3: 9 9 5: 15 13 10: 30 23 20: 60 43 50: 150 103 In other words, this is harmless if prefixes are longer than 2 characters; and a gain if prefixes are longer than 3 characters OR the number of symbols in the bucket sharing that number of prefixes is greater than 2. This is EXTREMELY likely. Of course, this is still all in theory. But the numbers are pretty. >=20 --=20 John Moser --=-4QGb980KktWXzqFqwV6T Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part Content-length: 827 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iQIVAwUAROqnuAs1xW0HCTEFAQKk7A//aMcFukMM81MxsQ214MIa1uYuCe762YmX zDWgFGUqz4AP6Iq16L2efWEaApUi16aJIoTqZn2wknvlzybm/c82suwCwZfPx8Fk LuAahBSHs44Lse555LyY1mOGzH0jzRkiTCimSpNIBllalsGpAdCk+a5jw4Kq0ElN cnmKUcImY7F1sERcyPzYbRp0wU/ape5vBwQreXgTgSFiJXZMonrVKg+tpmtXYvqY 1rZFcwQo5O7GlrAbq3+CdcCqywJswM6EqYD9aUFUPoycNCUKqXY+UVbus5H20RuH FPuwSEBbtP8CHAHSUab1jxFasRGsL9x0/XsDLT++NORCPQiePi2MPrZbuE892yJY XK8VXsjipU5zwr8BhaEnVegltp4BiTXK/Fucntg64IL+XVt34UXy+TwelzxVJZUI C8o8k/FXHkCNWbQcDKMGI/U0QC8uUR+ok3vd5kfsyCzwdF4ySAmmyPRBEGZ24Qa0 EOqZRZtyPKZF6ElaKQSWaCqKUGTeYpUMH/7ogjENp0BXGdWc/OBJmtvF1U1qqqll mYDzQRe78gyDUoCz789CJ5CSt0jjCTUeFaHQMIw8Ss3Szg/nOKXujr8B+PXTy3gs nOmNPGM9kXwJAwaj/0ckYNJ90RmIvsR6sFTM94ThMyvUmdGVGNUvKzLkYtrFMMyO DWdKFzIPuaE= =YKoJ -----END PGP SIGNATURE----- --=-4QGb980KktWXzqFqwV6T--