public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: John Moser <john.r.moser@gmail.com>
To: libc-alpha@sourceware.org
Cc: binutils <binutils@sourceware.org>
Subject: Re: Patricia Trie pseudo-test
Date: Tue, 22 Aug 2006 11:45:00 -0000	[thread overview]
Message-ID: <1156229049.5373.99.camel@localhost> (raw)
In-Reply-To: <1156225627.5373.83.camel@localhost>

[-- Attachment #1: Type: text/plain, Size: 2457 bytes --]

On Tue, 2006-08-22 at 01:47 -0400, John Moser wrote:

[...]
> I'm still thinking this is probably a win
> since we're unlikely to branch 8 times from a common prefix and even if
> we do we're suddenly faced with a situation where we CAN assume we're
> winning something*; Drepper are you around, I could use your input.

So I had some math run.


You have N string elements each sharing a common prefix of X characters.

If you arrange these in a trie, you have N branches to follow after the
prefix.

Each direction tested in a branch is identical in operation to a
character comparison.  (I designed the storage format that way
specifically.)

Every string is searched for.  The linker eventually goes through all
symbols, taking every possible search path.

If the strings are in a linked list, you compare the common prefixes 1+2
+...+N times, making X comparisons each time, totaling (1+2+...+N)*X or
X*((N+1) * N)/2 character comparisons.

If the strings are in a trie, you compare the common prefixes N times
and compare each branch direction ((N+1)*N)/2 times, totaling N*X + ((N
+1)*N)/2 branches.

These simplify to (N*(1 + N)*X)/2 (linked list) and (N*(1 + N))/2 + N*X
(patricia trie).

A little help from #math (who helped me figure out the simplified
versions) and we get:

<+Cale> % Reduce[{x*((n + 1)*n)/2 > n*x + ((n + 1)*n)/2, x > 0, n > 0}]
<mbot> Cale: x > 1 && n > (1 + x)/(-1 + x)

Now here's some interesting thoughts:

 X = 2; N > 3/1 (3)
 X = 3; N > 4/2 (2)
 X = 4; N > 5/3 (1.6)

Once we've passed prefixes 4 characters long, any branch will be a gain
because N will be 2 or more.  Also the more directions we branch in, the
more time we save; for example, take a prefix 3 characters long (let's
say namespace std, which is probably like _Z22std anyway) and branch it
3, 5, and 10 ways.
    LL     PT
3:  18     15
5:  45     30
10: 165    85

Or let's try a branch between 2 with prefixes 3, 5, 10, 20, and 50 long:
    LL     PT
3:  9      9
5:  15     13
10: 30     23
20: 60     43
50: 150    103

In other words, this is harmless if prefixes are longer than 2
characters; and a gain if prefixes are longer than 3 characters OR the
number of symbols in the bucket sharing that number of prefixes is
greater than 2.  This is EXTREMELY likely.

Of course, this is still all in theory.  But the numbers are pretty.
> 
-- 
John Moser <john.r.moser@gmail.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

      reply	other threads:[~2006-08-22  6:44 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-22  7:34 John Moser
2006-08-22 11:45 ` John Moser [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1156229049.5373.99.camel@localhost \
    --to=john.r.moser@gmail.com \
    --cc=binutils@sourceware.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).