public inbox for cgen@sourceware.org
 help / color / mirror / Atom feed
* hash function for assembly hash table
@ 2001-03-26 15:20 Ben Elliston
  0 siblings, 0 replies; 2+ messages in thread
From: Ben Elliston @ 2001-03-26 15:20 UTC (permalink / raw)
  To: cgen

Doing some debugging yesterday, I noticed that the cgen-generated
assembler uses a very simplistic hash function when building the hash
table used in assembly.  It uses:

      mnemonic[0] % 127

As you might expect, the mnemonics for many assembly languages are
clustered -- for example, there might be a dozen "load" (`l')
instructions and a dozen corresponding "store" (`s') instructions.
Some architectures feature multimedia instructions in which *every*
instruction is prefixed by `m'.

To see how bad this is, I did some plotting to show how the
distribution looks in the hash table for a chosen port:

    http://www.air.net.au/~bje/char0-mod127.ps

By changing the hash function to something more slightly
sophisticated:

     sum (all mnemonic chars) % 127

This yields a *much* better distribution with a lower average number
of entries per hash value:

    http://www.air.net.au/~bje/ascii-sum-mod127.ps

Any objections if we pursue a hash function like this?

Ben

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: hash function for assembly hash table
       [not found] <15039.52892.996310.33439.cygnus.local.cgen@scooby.apac.redhat.com>
@ 2001-03-26 15:30 ` Frank Ch. Eigler
  0 siblings, 0 replies; 2+ messages in thread
From: Frank Ch. Eigler @ 2001-03-26 15:30 UTC (permalink / raw)
  To: Ben Elliston; +Cc: cgen

bje wrote:

: Doing some debugging yesterday, I noticed that the cgen-generated
: assembler uses a very simplistic hash function when building the hash
: table used in assembly.  It uses:
:       mnemonic[0] % 127

This is just the default.

: [...]
: By changing the hash function to something more slightly
: sophisticated:
:      sum (all mnemonic chars) % 127
: This yields a *much* better distribution [...]

One problem with this is that it's hard for the target-independent
code to identify the end of the mnemonic.  There exist instruction
sets which separate the mnemonic proper with dots from various
optional appended operands.  (I suppose cgen might have enough
information to deduce the boundaries.)

In any case, for an even better hash function (with a collision rate
much nearer 0%), see the "Improving gas error messages" thread on this
very mailing list about four months ago.


- FChE

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-03-26 15:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-26 15:20 hash function for assembly hash table Ben Elliston
     [not found] <15039.52892.996310.33439.cygnus.local.cgen@scooby.apac.redhat.com>
2001-03-26 15:30 ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).