Hash Function for "switch statement"

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Hash Function for "switch statement"
@ 2010-03-14 20:48 Jae Hyuk Kwak
  2010-03-15  3:12 ` Basile Starynkevitch
       [not found] ` <20100317200410.GA13807@hungry-tiger.westford.ibm.com>
  0 siblings, 2 replies; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-14 20:48 UTC (permalink / raw)
  To: gcc

Hello, GCC developers.

I'm not sure whether this is a proper mail address to talk about this or not.
But I am giving a shot.

Last week, I was pondering a way to get Enum values from other unique
values like string and integer.
My thought reached at an idea of using Hash table.... as usual..

In addition, I found that switch statement can use "Hash Table" rather
than just replacing with "else if".
Besides using "jump table", "Hash Table" can increase speed.
Hash table idea can alleviate the problem of a huge size of jump table as well.

I explained this at more detail on my blog.
I hope anybody to have a look.

http://wrice.blogspot.com/2010/03/use-perfect-hash-table-for-alternative.html

I wanted to share my idea and wanted to hear from others.
Please let me know where I can talk to, if this mailing-list was not proper.
Thank you for reading.

Sincerely.

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-14 20:48 Hash Function for "switch statement" Jae Hyuk Kwak
@ 2010-03-15  3:12 ` Basile Starynkevitch
  2010-03-15  8:00   ` Jae Hyuk Kwak
       [not found] ` <20100317200410.GA13807@hungry-tiger.westford.ibm.com>
  1 sibling, 1 reply; 27+ messages in thread
From: Basile Starynkevitch @ 2010-03-15  3:12 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: gcc

Jae Hyuk Kwak wrote:
> Hello, GCC developers.
> 
> In addition, I found that switch statement can use "Hash Table" rather
> than just replacing with "else if".
> Besides using "jump table", "Hash Table" can increase speed.
> Hash table idea can alleviate the problem of a huge size of jump table as well.
> 


It is much more complex than that. Read the  paper "A Superoptimizer 
Analysis of Multiway Branch Code Generation" by Roger A. Sayle in GCC 
summit 2008 proceedings. www.gccsummit.org/2008/gcc-2008-proceedings.pdf

Regards.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-15  3:12 ` Basile Starynkevitch
@ 2010-03-15  8:00   ` Jae Hyuk Kwak
  2010-03-16  1:41     ` Jae Hyuk Kwak
  2010-03-16  6:00     ` Dave Korn
  0 siblings, 2 replies; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-15  8:00 UTC (permalink / raw)
  To: Basile Starynkevitch; +Cc: gcc

Thank you Basile.
The article you pointed is exactly what I wanted to find.
The paper summarized switch optimization very well, and it enlightened me.
I am also glad that it mentioned imperfect and perfect hash for switch
statement.

Unfortunately, the super-optimization that the paper suggests doesn't
adopt any of hash table ways. In addition, the paper skimmed the
advantage of perfect hash table and it didn't even mention minimal
perfect hash table at all.

I think that for the "speed" optimization, perfect hash way is the
best candidate after jump table if it is applicable. I am currently
working on PlayStation3 game development, and we don't want to use
branch operation. Since 3d games value relatively more on speed than
space, I am still interested on perfect hash for switch statement,
because it is more generic than others the paper mentioned. It will
also require (possibly) zero branching. It would be not only my
personal preference but also the favorite of most game developers.

As usual for 3d game programming process, we have pre-calculation
steps for graphics data. During the process I am thinking to add one
more process that generates perfect hash table. The manual
implementation of perfect hash table as an alternative mean for switch
statement does not seem hard. So I may do it by myself without too
much pain, but It can make my job easier if gcc can play the role.

I wish to see gcc can adopt any of better switch optimization ways in
near future.

I haven't heard about "MELT" before and still don't know what exactly
it is. Is it able to deal with this kind of problem?

Thank you anyway.
Without the reply mail, I couldn't be satisfied this much. :-)

Regards,

Jay

On Sun, Mar 14, 2010 at 3:59 PM, Basile Starynkevitch
<basile@starynkevitch.net> wrote:
> Jae Hyuk Kwak wrote:
>>
>> Hello, GCC developers.
>>
>> In addition, I found that switch statement can use "Hash Table" rather
>> than just replacing with "else if".
>> Besides using "jump table", "Hash Table" can increase speed.
>> Hash table idea can alleviate the problem of a huge size of jump table as
>> well.
>>
>
>
> It is much more complex than that. Read the  paper "A Superoptimizer
> Analysis of Multiway Branch Code Generation" by Roger A. Sayle in GCC summit
> 2008 proceedings. www.gccsummit.org/2008/gcc-2008-proceedings.pdf
>
> Regards.
> --
> Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
> email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mines, sont seulement les miennes} ***
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-15  8:00   ` Jae Hyuk Kwak
@ 2010-03-16  1:41     ` Jae Hyuk Kwak
  2010-03-16  7:12       ` Basile Starynkevitch
  2010-03-16  6:00     ` Dave Korn
  1 sibling, 1 reply; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-16  1:41 UTC (permalink / raw)
  To: gcc

I found that I had a wrong assumption on this issue.

In order to use Perfect Hash Table, we need to know every key values
at compile time, but the key values are determined at run-time so that
it is not feasible idea.

On my project, however, the key values were fixed amount, so that I
confused at that point.

I'm sorry...

Jay



On Mon, Mar 15, 2010 at 12:19 AM, Jae Hyuk Kwak <wrice127@gmail.com> wrote:
> Thank you Basile.
> The article you pointed is exactly what I wanted to find.
> The paper summarized switch optimization very well, and it enlightened me.
> I am also glad that it mentioned imperfect and perfect hash for switch
> statement.
>
> Unfortunately, the super-optimization that the paper suggests doesn't
> adopt any of hash table ways. In addition, the paper skimmed the
> advantage of perfect hash table and it didn't even mention minimal
> perfect hash table at all.
>
> I think that for the "speed" optimization, perfect hash way is the
> best candidate after jump table if it is applicable. I am currently
> working on PlayStation3 game development, and we don't want to use
> branch operation. Since 3d games value relatively more on speed than
> space, I am still interested on perfect hash for switch statement,
> because it is more generic than others the paper mentioned. It will
> also require (possibly) zero branching. It would be not only my
> personal preference but also the favorite of most game developers.
>
> As usual for 3d game programming process, we have pre-calculation
> steps for graphics data. During the process I am thinking to add one
> more process that generates perfect hash table. The manual
> implementation of perfect hash table as an alternative mean for switch
> statement does not seem hard. So I may do it by myself without too
> much pain, but It can make my job easier if gcc can play the role.
>
> I wish to see gcc can adopt any of better switch optimization ways in
> near future.
>
> I haven't heard about "MELT" before and still don't know what exactly
> it is. Is it able to deal with this kind of problem?
>
> Thank you anyway.
> Without the reply mail, I couldn't be satisfied this much. :-)
>
> Regards,
>
> Jay
>
>
> On Sun, Mar 14, 2010 at 3:59 PM, Basile Starynkevitch
> <basile@starynkevitch.net> wrote:
>> Jae Hyuk Kwak wrote:
>>>
>>> Hello, GCC developers.
>>>
>>> In addition, I found that switch statement can use "Hash Table" rather
>>> than just replacing with "else if".
>>> Besides using "jump table", "Hash Table" can increase speed.
>>> Hash table idea can alleviate the problem of a huge size of jump table as
>>> well.
>>>
>>
>>
>> It is much more complex than that. Read the  paper "A Superoptimizer
>> Analysis of Multiway Branch Code Generation" by Roger A. Sayle in GCC summit
>> 2008 proceedings. www.gccsummit.org/2008/gcc-2008-proceedings.pdf
>>
>> Regards.
>> --
>> Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
>> email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
>> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
>> *** opinions {are only mines, sont seulement les miennes} ***
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-15  8:00   ` Jae Hyuk Kwak
  2010-03-16  1:41     ` Jae Hyuk Kwak
@ 2010-03-16  6:00     ` Dave Korn
  1 sibling, 0 replies; 27+ messages in thread
From: Dave Korn @ 2010-03-16  6:00 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: Basile Starynkevitch, gcc

On 15/03/2010 07:19, Jae Hyuk Kwak wrote:

> I think that for the "speed" optimization, perfect hash way is the
> best candidate after jump table if it is applicable. 

  It should be pointed out that your article contains a false assumption
about how fast the various options are relative to each other:

> For example, when we have a piece of code like this:
> 
>     if( a == 1 ) doSomething1();
>     else if( a == 2 ) doSomething2();
>     else if( a == 3 ) doSomething3();
>     ....
>     else if( a == 40000 ) doSomething40000();
> 
> For each line, CPU, more precisely ALU, will evaluate each statement: "a ==
> 1", "a == 2" and so on. In other words, CPU need to calculate 40000 times
> for the same value "a".

  This is not true unless a is a volatile variable.  The compiler will
evaluate a once and then compare it against the constants.

> More intuitive representation for this testing can be like this:
> 
>     switch( a )
>     {
>     case 1: doSomething1(); break;
>     case 2: doSomething2(); break;
>     case 3: doSomething3(); break;
>     ...
>     case 40000: doSomething4000(); break;
>     }
> 
> This "switch statement" gives us an illusion that CPU will evaluate the
> value of "a" only one time.
> 
> According to an article on CodeGuru, however, "switch" statement will be
> replaced by "else if"

  It may, but in this case, the result is even better; the compiler will only
ever evaluate "a" once even if it *is* volatile.

    cheers,
      DaveK


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-16  1:41     ` Jae Hyuk Kwak
@ 2010-03-16  7:12       ` Basile Starynkevitch
  0 siblings, 0 replies; 27+ messages in thread
From: Basile Starynkevitch @ 2010-03-16  7:12 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: gcc

Jae Hyuk Kwak wrote:
>> I haven't heard about "MELT" before and still don't know what exactly
>> it is. Is it able to deal with this kind of problem?

MELT is a GCC branch and a GCC plugin. It provides a Lispy domain 
specific language to code GCC extensions in. More details on the GCC 
wiki, in particular http://gcc.gnu.org/wiki/MiddleEndLispTranslator & 
http://gcc.gnu.org/wiki/MELT%20tutorial

I believe that MELT would be a good tool for Jae to experiment his ideas 
on the switch statements. He probably needs to implement a new GIMPLE 
pass which tranform a GIMPLE representation into another, and MELT is 
well suited for that. Feel free to ask more.

Cheers

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
       [not found] ` <20100317200410.GA13807@hungry-tiger.westford.ibm.com>
@ 2010-03-18  6:39   ` Jae Hyuk Kwak
  2010-03-18 11:20     ` Andrew Haley
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-18  6:39 UTC (permalink / raw)
  To: Michael Meissner, gcc

On Wed, Mar 17, 2010 at 1:04 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> Note, that many hash tables are computed by the modulus operation, which is
> often fairly expensive (and on machines without a hardware divide unit,
> requiring a function call).  I would expect many switch statements would slow
> down if you switch to a hash operation that used modolus.
>

Hi Michael,

I agree that the cost of modulation can be high, but it can be even
higher if we use a bunch of "else if".
Consider the situation that a program has about 400 cases on a single
switch statement.

The cost of modulation will be fixed price so that there should be a
certain point that the price bis lower than else if statements.

If jump table is possible, it can be a choice, but jump table is not
always feasible depending on the values on "case".

Thank you for the reply.

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-18  6:39   ` Jae Hyuk Kwak
@ 2010-03-18 11:20     ` Andrew Haley
       [not found]     ` <20100318151753.GA4065@hungry-tiger.westford.ibm.com>
  2010-03-19 19:14     ` Andrew Haley
  2 siblings, 0 replies; 27+ messages in thread
From: Andrew Haley @ 2010-03-18 11:20 UTC (permalink / raw)
  To: gcc

On 03/18/2010 05:22 AM, Jae Hyuk Kwak wrote:
> On Wed, Mar 17, 2010 at 1:04 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> Note, that many hash tables are computed by the modulus operation, which is
>> often fairly expensive (and on machines without a hardware divide unit,
>> requiring a function call).  I would expect many switch statements would slow
>> down if you switch to a hash operation that used modolus.
> 
> I agree that the cost of modulation can be high, but it can be even
> higher if we use a bunch of "else if".
> Consider the situation that a program has about 400 cases on a single
> switch statement.
> 
> The cost of modulation will be fixed price so that there should be a
> certain point that the price bis lower than else if statements.

Sure, but how often is that going to be cheaper than, for example,
using a tree rather than a hash table?

Andrew.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
       [not found]     ` <20100318151753.GA4065@hungry-tiger.westford.ibm.com>
@ 2010-03-19  5:26       ` Jae Hyuk Kwak
  2010-03-19 12:26         ` Frank Ch. Eigler
                           ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-19  5:26 UTC (permalink / raw)
  To: Michael Meissner, gcc

Hi Michael,

Thank you for the details.

On Thu, Mar 18, 2010 at 8:17 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
>> > Note, that many hash tables are computed by the modulus operation, which is
>> > often fairly expensive (and on machines without a hardware divide unit,
>> > requiring a function call).  I would expect many switch statements would slow
>> > down if you switch to a hash operation that used modolus.
>>
>> I agree that the cost of modulation can be high, but it can be even
>> higher if we use a bunch of "else if".
>> Consider the situation that a program has about 400 cases on a single
>> switch statement.
>
> However the compiler doesn't do 400 if-then-else's, instead it generates a
> decision tree using less than/greater than type tests so that log2
> comparison/branches are made.  So if there are 400 cases, it means there are
> probably 10 comparisons/branches done for any given value (and on some
> architectures, the last comparison for equality can be eliminated).  Looking at
> the i386.c tunings, you see divides are in the 66 - 74 cycle ranges for popular
> current chips.  If a comparison/branch is say 6 cycles, that means it is still
> faster on average to do the tree of if's than to do a modulus, load, and
> branch for 400 or so switch elements.

If I understood correctly, your point is that O(log N) is fast enough
because the size of switch is small in practice.
But I am still not convinced that using hash table would not increase the speed.
As we know hash table requires O(N) only.
There must be some point that O(N) bits O(logN), depending on the size
of N and the constant cost of O(N).

Is that true that current implementation of gcc on i386 optimizes
switch statement with decision tree?
I was expecting somebody who can confirm this.

I tried to find the specific implementation on the file, i386.c.
But it is very big file and I have only limited knowledge of gcc yet.
If you can point more specific implementation part, I may be able to
catch up what you meant.

>> If jump table is possible, it can be a choice, but jump table is not
>> always feasible depending on the values on "case".
>
> There are other strategies, for example if most values are clustered together,
> you can use if's for the outlying values, and then use a jump table for the
> values clustered together.  When I did the Data General C compiler many years
> ago, this was a win in our particular environment, because there were often
> code that did a switch on errno with a 0 case, and all of the E values.  In the
> DG environment, the E values were high because each language had its own range
> of error codes, and the C language was a relative late comer.

The paper that Basile pointed well summarized some of strategies you
might refer to.
As I mentioned earlier, the paper uses a branching tree way but does
not dig into hash table part much.
And at the time moment, 2008, gcc did not seem to use even decision
tree for switch statement yet.

Thanks

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19  5:26       ` Jae Hyuk Kwak
@ 2010-03-19 12:26         ` Frank Ch. Eigler
  2010-03-19 18:11         ` Jae Hyuk Kwak
       [not found]         ` <20100319165443.GA9993@hungry-tiger.westford.ibm.com>
  2 siblings, 0 replies; 27+ messages in thread
From: Frank Ch. Eigler @ 2010-03-19 12:26 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: Michael Meissner, gcc

Jae Hyuk Kwak <wrice127@gmail.com> writes:

> [...]
> Is that true that current implementation of gcc on i386 optimizes
> switch statement with decision tree?
> I was expecting somebody who can confirm this.

You can see for yourself by writing a variety of switch{} statements
and observing the assembly code (-fverbose-asm) and/or dump
intermediate output (-d).  To see more of the spectrum of
possibilities, make tables large/small, sparse/clustered/dense, case
blocks small/large/fallthrough,

> I tried to find the specific implementation on the file, i386.c.
> But it is very big file and I have only limited knowledge of gcc yet.
> If you can point more specific implementation part, I may be able to
> catch up what you meant.

See gcc/stmt.c for starters.

- FChE

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19  5:26       ` Jae Hyuk Kwak
  2010-03-19 12:26         ` Frank Ch. Eigler
@ 2010-03-19 18:11         ` Jae Hyuk Kwak
       [not found]         ` <20100319165443.GA9993@hungry-tiger.westford.ibm.com>
  2 siblings, 0 replies; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-19 18:11 UTC (permalink / raw)
  To: gcc

Let me correct my mistake.

> If I understood correctly, your point is that O(log N) is fast enough
> because the size of switch is small in practice.
> But I am still not convinced that using hash table would not increase the speed.
> As we know hash table requires O(N) only.
> There must be some point that O(N) bits O(logN), depending on the size
> of N and the constant cost of O(N).

The notation on here of O(N) had to be O(1).

Sorry.

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-18  6:39   ` Jae Hyuk Kwak
  2010-03-18 11:20     ` Andrew Haley
       [not found]     ` <20100318151753.GA4065@hungry-tiger.westford.ibm.com>
@ 2010-03-19 19:14     ` Andrew Haley
  2 siblings, 0 replies; 27+ messages in thread
From: Andrew Haley @ 2010-03-19 19:14 UTC (permalink / raw)
  To: gcc

On 03/18/2010 05:22 AM, Jae Hyuk Kwak wrote:
> On Wed, Mar 17, 2010 at 1:04 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> Note, that many hash tables are computed by the modulus operation, which is
>> often fairly expensive (and on machines without a hardware divide unit,
>> requiring a function call).  I would expect many switch statements would slow
>> down if you switch to a hash operation that used modolus.
>>
> 
> I agree that the cost of modulation can be high, but it can be even
> higher if we use a bunch of "else if".
> Consider the situation that a program has about 400 cases on a single
> switch statement.
> 
> The cost of modulation will be fixed price so that there should be a
> certain point that the price bis lower than else if statements.
> 
> If jump table is possible, it can be a choice, but jump table is not
> always feasible depending on the values on "case".

You don't have to speculate, you can measure.

gcc has the "labels as values" extension which you can use to simulate 
a hash-based switch statement.  So, please try it and see if it's faster.

Andrew.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
       [not found]         ` <20100319165443.GA9993@hungry-tiger.westford.ibm.com>
@ 2010-03-19 21:26           ` Jae Hyuk Kwak
  2010-03-19 21:30             ` Robert Dewar
  0 siblings, 1 reply; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-19 21:26 UTC (permalink / raw)
  To: Michael Meissner, gcc

Hi Michael,

Thank you for the detailed response.

On Fri, Mar 19, 2010 at 9:54 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Thu, Mar 18, 2010 at 05:10:18PM -0700, Jae Hyuk Kwak wrote:
>> Hi Michael,
>>
>> Thank you for the details.
>> If I understood correctly, your point is that O(log N) is fast enough
>> because the size of switch is small in practice.
>> But I am still not convinced that using hash table would not increase the speed.
>> As we know hash table requires O(N) only.
>> There must be some point that O(N) bits O(logN), depending on the size
>> of N and the constant cost of O(N).
>
> Sure, but you need to do a cost analysis whether it is worthwhile.  I suspect
> it would not win until the number of cases is about 2000 or so, assuming you
> can't use a jump table.  I'm not against the optimization per se, but I don't
> want any current switch statement to slow down because you just assume it is
> faster and the cost analysis used to decide which way to do it did not account
> for all costs.  The only way to solve this is to implement it and run various
> benchmarks on real hardware.
>
> Modulus is a rather expensive operation in almost all processors.  Some
> processors additionally do not provide a modulus operation, and you have to get
> it by doing a divide, multiply, and subtract (and some machines don't have a
> divide at all, and you have to do a loop of shift/adds to replicate the
> divide).

I agree that we need to do the performance benchmarks whenever we talk
about performance issue.
It often gives us surprises when we see the actual results.

For the issue of Modulus operation, it does not need to be divide or
hardware modulus operation.
Let me give you an example here:
13 % 2 = 1
13 & (2-1) = 1

In English, we can use AND operator as substitutes of modulus operator.
In the way, the size of hash table should be restricted to certain
values like 2, 4, 8, 16, and so on.
But it is not a big problem because by the nature of Hash Table, we
don't make the table full.
For example, we have 257 cases on a switch statement, then we need a
hash table whose bucket size is 512.
It will have 50% filled hash table, but 50% filled hash table is usual.

> In addition to doing the modulus operation, you need to have an entry that
> gives the value in the switch statement as well as the jump address, and do an
> equality comparison against that value, otherwise you will get false positives.

In order to prevent the false positives, we need to store the key
value as you correctly pointed.
So the final process will be first to the hash key value calculation
and then go to the hash table, and compare whether the value is
correct or not.
In programming it will be like this.

typedef void (*CB)();
void doSomething1();
void doSomething40000();
std::map< int, CB > hashTable;
hashTable[ 1 ] = doSomething1;
hashTable[ 40000 ] = doSomething40000;
const std::pair< int, CB > found = hashTable.find( a );
if( found.first == a ) (*(found.second))();

> I believe the code has been in the compiler since 4.3 at least.

Thank you for the information.

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 21:26           ` Jae Hyuk Kwak
@ 2010-03-19 21:30             ` Robert Dewar
  2010-03-19 21:53               ` Jae Hyuk Kwak
  2010-03-19 22:24               ` Dave Korn
  0 siblings, 2 replies; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 21:30 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: Michael Meissner, gcc

Jae Hyuk Kwak wrote:

> For the issue of Modulus operation, it does not need to be divide or
> hardware modulus operation.
> Let me give you an example here:
> 13 % 2 = 1
> 13 & (2-1) = 1

Yes, of course, we all know that, and of course the compiler does
these simple optimizations. However, for computing hashes you
typically want modulus with OTHER than a power of 2 to involve
all bits.
> 
> In English, we can use AND operator as substitutes of modulus operator.
> In the way, the size of hash table should be restricted to certain
> values like 2, 4, 8, 16, and so on.
> But it is not a big problem because by the nature of Hash Table, we
> don't make the table full.
> For example, we have 257 cases on a switch statement, then we need a
> hash table whose bucket size is 512.
> It will have 50% filled hash table, but 50% filled hash table is usual.

Again, yes, of course, but typically power of two size for a hash table
followed by an and that takes only the low order bits is likely to be
a very poor hash choice.

> In order to prevent the false positives, we need to store the key
> value as you correctly pointed.
> So the final process will be first to the hash key value calculation
> and then go to the hash table, and compare whether the value is
> correct or not.

Again, you can assume we all know the elementary algorithms for
managing hash tables.

One thing to realize is that hashing only has good average performance.
So your O(N) is talking about average case performance, whereas the
O(NlogN) for a tree search is worst case. That's comparing apples and
oranges. It is worrisome to have the possibility of really bad
performance for a particular bad luck case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 21:30             ` Robert Dewar
@ 2010-03-19 21:53               ` Jae Hyuk Kwak
  2010-03-19 22:18                 ` Robert Dewar
  2010-03-19 22:24               ` Dave Korn
  1 sibling, 1 reply; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-19 21:53 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc

Hi Robert,


On Fri, Mar 19, 2010 at 2:13 PM, Robert Dewar <dewar@adacore.com> wrote:
> Jae Hyuk Kwak wrote:
>
>> For the issue of Modulus operation, it does not need to be divide or
>> hardware modulus operation.
>> Let me give you an example here:
>> 13 % 2 = 1
>> 13 & (2-1) = 1
>
> Yes, of course, we all know that, and of course the compiler does
> these simple optimizations. However, for computing hashes you
> typically want modulus with OTHER than a power of 2 to involve
> all bits.

The point that I made is we don't need the "OTHER".

For example, when we have 200 cases in a switch, we don't need to use
200 as OTHER, but we can just use 256.
Because there is no reason to stick on the number of cases for the
case of hash table.


>>
>> In English, we can use AND operator as substitutes of modulus operator.
>> In the way, the size of hash table should be restricted to certain
>> values like 2, 4, 8, 16, and so on.
>> But it is not a big problem because by the nature of Hash Table, we
>> don't make the table full.
>> For example, we have 257 cases on a switch statement, then we need a
>> hash table whose bucket size is 512.
>> It will have 50% filled hash table, but 50% filled hash table is usual.
>
> Again, yes, of course, but typically power of two size for a hash table
> followed by an and that takes only the low order bits is likely to be
> a very poor hash choice.

I think we have different assumption on the hash function.
The hash function I am thinking of is not just taking lower bits.
The AND operator part takes place after hash function is done.
Those hash functions can be one of functions on this web page:
http://www.concentric.net/~Ttwang/tech/inthash.htm

Let me copy and paste the function to prevent misunderstanding.
public int hash32shift(int key)
{
  key = ~key + (key << 15); // key = (key << 15) - key - 1;
  key = key ^ (key >>> 12);
  key = key + (key << 2);
  key = key ^ (key >>> 4);
  key = key * 2057; // key = (key + (key << 3)) + (key << 11);
  key = key ^ (key >>> 16);
  return key;
}

int value = given_from_user_at_runtime();
int hashKey = hash32shift( value );
int actualHashKey = hashKey & ( 512 - 1 );
HashBucket * bucket = hashTable[ actualHashKey ];
if( bucket->key == value ) (*(bucket->function)();


>> In order to prevent the false positives, we need to store the key
>> value as you correctly pointed.
>> So the final process will be first to the hash key value calculation
>> and then go to the hash table, and compare whether the value is
>> correct or not.
>
> Again, you can assume we all know the elementary algorithms for
> managing hash tables.
>
> One thing to realize is that hashing only has good average performance.
> So your O(N) is talking about average case performance, whereas the
> O(NlogN) for a tree search is worst case. That's comparing apples and
> oranges. It is worrisome to have the possibility of really bad
> performance for a particular bad luck case.

As I have already corrected, the performance of HashTable way is O(1) not O(N).

You may think it is not proper comparison, but I think comparing "tree
search" with hash table.
Let me cite a sentence from Wiki: http://en.wikipedia.org/wiki/Hash_table

"In many situations, hash tables turn out to be more efficient than
search trees or any other table lookup structure."


Thanks for your response anyway.

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 21:53               ` Jae Hyuk Kwak
@ 2010-03-19 22:18                 ` Robert Dewar
  2010-03-19 22:43                   ` Dave Korn
  2010-03-19 22:57                   ` Jae Hyuk Kwak
  0 siblings, 2 replies; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 22:18 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: gcc

Jae Hyuk Kwak wrote:
> Hi Robert,
> 
> 
> On Fri, Mar 19, 2010 at 2:13 PM, Robert Dewar <dewar@adacore.com> wrote:
>> Jae Hyuk Kwak wrote:
>>
>>> For the issue of Modulus operation, it does not need to be divide or
>>> hardware modulus operation.
>>> Let me give you an example here:
>>> 13 % 2 = 1
>>> 13 & (2-1) = 1
>> Yes, of course, we all know that, and of course the compiler does
>> these simple optimizations. However, for computing hashes you
>> typically want modulus with OTHER than a power of 2 to involve
>> all bits.
> 
> The point that I made is we don't need the "OTHER".
> 
> For example, when we have 200 cases in a switch, we don't need to use
> 200 as OTHER, but we can just use 256.
> Because there is no reason to stick on the number of cases for the
> case of hash table.

You miss my point, doing a mod with 256 is an AWFUL hash algorithm
since it only uses the low order 8 bits! But if you are just doing
the AND to get the final result, yes, of course again that's totally
familiar to everyone involved with compilers.

> I think we have different assumption on the hash function.
> The hash function I am thinking of is not just taking lower bits.
> The AND operator part takes place after hash function is done.
> Those hash functions can be one of functions on this web page:
> http://www.concentric.net/~Ttwang/tech/inthash.htm

Sure that's a possibility

> Let me copy and paste the function to prevent misunderstanding.
> public int hash32shift(int key)
> {
>   key = ~key + (key << 15); // key = (key << 15) - key - 1;
>   key = key ^ (key >>> 12);
>   key = key + (key << 2);
>   key = key ^ (key >>> 4);
>   key = key * 2057; // key = (key + (key << 3)) + (key << 11);
>   key = key ^ (key >>> 16);
>   return key;
> }
> 
> int value = given_from_user_at_runtime();
> int hashKey = hash32shift( value );
> int actualHashKey = hashKey & ( 512 - 1 );
> HashBucket * bucket = hashTable[ actualHashKey ];
> if( bucket->key == value ) (*(bucket->function)();
> 
> 
>>> In order to prevent the false positives, we need to store the key
>>> value as you correctly pointed.
>>> So the final process will be first to the hash key value calculation
>>> and then go to the hash table, and compare whether the value is
>>> correct or not.
>> Again, you can assume we all know the elementary algorithms for
>> managing hash tables.
>>
>> One thing to realize is that hashing only has good average performance.
>> So your O(N) is talking about average case performance, whereas the
>> O(NlogN) for a tree search is worst case. That's comparing apples and
>> oranges. It is worrisome to have the possibility of really bad
>> performance for a particular bad luck case.
> 
> As I have already corrected, the performance of HashTable way is O(1) not O(N).

Well I was giving the performance of N executions, which is O(N) for the
hash table, and O(NlogN) for the tree, if you just want to consider a
single execution, then the result is O(1) for the hash and O(logN) for
the tree search, so my comparison was consistent.
> 
> You may think it is not proper comparison, but I think comparing "tree
> search" with hash table.
> Let me cite a sentence from Wiki: http://en.wikipedia.org/wiki/Hash_table
> 
> "In many situations, hash tables turn out to be more efficient than
> search trees or any other table lookup structure."

yes, of course, that's why all compilers use hash tables extensively,
I think you will find that people on this mailing list know all about
hash tables (I would not be surprised if someone here wrote the (rather
elementary) wikipedia article). BTW, I would recommend a more thorough
source, e.g. Knuth AOP for reading up on hashing.

The above quote from Wikipedia is indeed misleading because it does
not make a clearer contrast between average behavior and worst case
behavior (the situation is similar to comparing quick sort with heap
sort, they are both NlogN on average, but the constant is smaller
for quick sort, so it is generally better, but the worst case is
N**2 for quick sort and NlogN for heap sort.

So this does not get around the possibility of a bad luck worst case.
It is one thing for a program to compiler slowly because by bad luck
it has identifier names that hash badly, it is QUITE another to hit
bad luck at execution time which results in a particular program
running horribly slow. I am dubious about introducing this kind of
uncertainty.

I trust this point is clear to you?

You seem to write as though the discovery of hash algorithms is new
and interesting in this context. In fact the use of hash tables for
switch statements is very old (BCPL comes to mind (*) , as is the
understanding of whether it is a useful approach). Along with
others in this thread, I very much doubt it is useful in practical
cases. And for certainly you would have to do extensive careful
benchmarks to show it was a worthwhile replacement, especially
given the worst-case behavior.

(*) Here is a 1969 BCPL reference (more than 40 years old!)

> http://portal.acm.org/citation.cfm?id=1476880&dl=GUIDE&coll=GUIDE&CFID=80853141&CFTOKEN=76204023

A quote from this article:

> " ... the switch very efficiently, even constructing a hash table if this method of switching is ..."

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 21:30             ` Robert Dewar
  2010-03-19 21:53               ` Jae Hyuk Kwak
@ 2010-03-19 22:24               ` Dave Korn
  2010-03-19 23:09                 ` Robert Dewar
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Korn @ 2010-03-19 22:24 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Jae Hyuk Kwak, Michael Meissner, gcc

On 19/03/2010 21:13, Robert Dewar wrote:
> Jae Hyuk Kwak wrote:
> 
>> For the issue of Modulus operation, it does not need to be divide or
>> hardware modulus operation.

> Yes, of course, we all know that, and of course the compiler does
> these simple optimizations. However, for computing hashes you
> typically want modulus with OTHER than a power of 2 to involve
> all bits.

  Please, nobody bring up the old saw that prime numbers are a good choice for
hashtable sizes.  They aren't, they're a crude workaround for a poor hash
function.

>> For example, we have 257 cases on a switch statement, then we need a
>> hash table whose bucket size is 512.
>> It will have 50% filled hash table, but 50% filled hash table is usual.
> 
> Again, yes, of course, but typically power of two size for a hash table
> followed by an and that takes only the low order bits is likely to be
> a very poor hash choice.

  Only if there's something wrong with your hash function in the first place.
 If your hash has decent diffusion, all the output bits should be equally
affected by all the input bits, and it should make no difference which ones
you use.

> Again, you can assume we all know the elementary algorithms for
> managing hash tables.

  I think you need to refresh your memory about some of the more advanced
points though.

> One thing to realize is that hashing only has good average performance.
> So your O(N) is talking about average case performance, whereas the
> O(NlogN) for a tree search is worst case. That's comparing apples and
> oranges. It is worrisome to have the possibility of really bad
> performance for a particular bad luck case.

  That's not how minimal perfect hashing works, which was one of the main
suggestions.

    cheers,
      DaveK
-- 
(*) - http://burtleburtle.net/bob/hash/evahash.html
    - http://www.burtleburtle.net/bob/hash/doobs.html
    - http://burtleburtle.net/bob/hash/index.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 22:18                 ` Robert Dewar
@ 2010-03-19 22:43                   ` Dave Korn
  2010-03-19 23:28                     ` Robert Dewar
  2010-03-19 22:57                   ` Jae Hyuk Kwak
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Korn @ 2010-03-19 22:43 UTC (permalink / raw)
  To: Robert Dewar; +Cc: Jae Hyuk Kwak, gcc

On 19/03/2010 22:07, Robert Dewar wrote:

> You miss my point, doing a mod with 256 is an AWFUL hash algorithm
> since it only uses the low order 8 bits! 

  This statement is only true contingent on a number of significant
assumptions you haven't stated - assumptions which can easily be violated.

> I think you will find that people on this mailing list know all about
> hash tables 

  I think you should get down from that high horse before you come down with
an embarrassing bump.

> So this does not get around the possibility of a bad luck worst case.

  Perfect hashing does exactly that.  That's why it's "popular for hashing
keywords for compilers", and indeed why it "ought to be popular for optimizing
switch statements":

http://burtleburtle.net/bob/hash/perfect.html

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 22:18                 ` Robert Dewar
  2010-03-19 22:43                   ` Dave Korn
@ 2010-03-19 22:57                   ` Jae Hyuk Kwak
  2010-03-19 23:33                     ` Robert Dewar
  2010-03-19 23:33                     ` Robert Dewar
  1 sibling, 2 replies; 27+ messages in thread
From: Jae Hyuk Kwak @ 2010-03-19 22:57 UTC (permalink / raw)
  To: Robert Dewar; +Cc: gcc

Thanks for the fast reply,

On Fri, Mar 19, 2010 at 3:07 PM, Robert Dewar <dewar@adacore.com> wrote:
>> "In many situations, hash tables turn out to be more efficient than
>> search trees or any other table lookup structure."
>
> The above quote from Wikipedia is indeed misleading because it does
> not make a clearer contrast between average behavior and worst case
> behavior (the situation is similar to comparing quick sort with heap
> sort, they are both NlogN on average, but the constant is smaller
> for quick sort, so it is generally better, but the worst case is
> N**2 for quick sort and NlogN for heap sort.
>
> So this does not get around the possibility of a bad luck worst case.
> It is one thing for a program to compiler slowly because by bad luck
> it has identifier names that hash badly, it is QUITE another to hit
> bad luck at execution time which results in a particular program
> running horribly slow. I am dubious about introducing this kind of
> uncertainty.

I am not interested in making compile time shorter.
I am here talking about the speed at run-time not at compile-time.
I have already mentioned it earlier.

I assume that by "hash badly" you mean a hash function generates an
identical result even from different key values.
For the hash table as alternative way of switch statement, it does not
have "hash badly" situation at run-time.
It is because the hash table I'm talking about is a perfect hash table.

Let's say we adopt this hash function.
It is also from the same web site.

public int hash32shiftmult(int key, int c2=0x27d4eb2d )
{
  key = (key ^ 61) ^ (key >>> 16);
  key = key + (key << 3);
  key = key ^ (key >>> 4);
  key = key * c2;
  key = key ^ (key >>> 15);
  return key;
}

Now we have a switch statement that has 400 cases or whatever.

switch ( valueFormUser )
{
   case 0: do1(); break;
   case 1: do2(); break;
   case 2: do3(); break;
   ...
   case 399: do399(); break;
   default: break;
}

In the case the conflict happened, we change the value of "c2" until
none of values of cases conflict.
In other words, we change c2 value until we get "hash32shiftmult( 0,
c2 ) != hash32shiftmult( 1, c2 ) != hash32shiftmult( 2, c2 )...."

Therefore we will have "Perfect hash table", so that there is no
conflict during run-time.
In other words, we will have fixed constant cost for hash table
resolving; there is no 'hash badly" case.

In fact, it will make compile time longer, if it is your interest.
But again, I am talking about run-time speed.

> A quote from this article:
>
>> " ... the switch very efficiently, even constructing a hash table if this
>> method of switching is ..."

Yes, it does. Such a old paper mentioned it and we are still not doing it.
So it makes me wonder why.

Thanks

Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 22:24               ` Dave Korn
@ 2010-03-19 23:09                 ` Robert Dewar
  2010-03-19 23:17                   ` Robert Dewar
  0 siblings, 1 reply; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 23:09 UTC (permalink / raw)
  To: Dave Korn; +Cc: Jae Hyuk Kwak, Michael Meissner, gcc

Dave Korn wrote:

>   Please, nobody bring up the old saw that prime numbers are a good choice for
> hashtable sizes.  They aren't, they're a crude workaround for a poor hash
> function.

Well on a machine with a fast modulus operation, the crude workaround
is often the most efficient algorithm in practice.

>> One thing to realize is that hashing only has good average performance.
>> So your O(N) is talking about average case performance, whereas the
>> O(NlogN) for a tree search is worst case. That's comparing apples and
>> oranges. It is worrisome to have the possibility of really bad
>> performance for a particular bad luck case.
> 
>   That's not how minimal perfect hashing works, which was one of the main
> suggestions.

Right, in the case where you can do minimal perfect hashing, of course
the consideration of average case performance does not apply.
> 
>     cheers,
>       DaveK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 23:09                 ` Robert Dewar
@ 2010-03-19 23:17                   ` Robert Dewar
  0 siblings, 0 replies; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 23:17 UTC (permalink / raw)
  To: Dave Korn; +Cc: Jae Hyuk Kwak, Michael Meissner, gcc

By the way, in practice I have found that in many situations, the
input to hash functions is nowhere near pseudo-random, e.g. this
is very much true of identifiers in programs, so the best hash
algorithm is often one that is specialized for the particular
non-pseudo-random domain.

Of course in the case of switch statements, it is perfectly fine
to have a different algorithm for each switch statement (using
perfect hashing where appropriate). That's what the BCPL compiler
did (these many 42 years ago :-))

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 22:43                   ` Dave Korn
@ 2010-03-19 23:28                     ` Robert Dewar
  0 siblings, 0 replies; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 23:28 UTC (permalink / raw)
  To: Dave Korn; +Cc: Jae Hyuk Kwak, gcc

Dave Korn wrote:
> On 19/03/2010 22:07, Robert Dewar wrote:
> 
>> You miss my point, doing a mod with 256 is an AWFUL hash algorithm
>> since it only uses the low order 8 bits! 
> 
>   This statement is only true contingent on a number of significant
> assumptions you haven't stated - assumptions which can easily be violated.

I did make the mistake of assuming that since JHK talked about 
efficiency of the mod operation he was assuming mod as part of
the hashing algorithm, rather than just a way of narrowing the
range of the result of an otherwise already pseudo-randomized hash.
> 
>> I think you will find that people on this mailing list know all about
>> hash tables 
> 
>   I think you should get down from that high horse before you come down with
> an embarrassing bump.
> 
>> So this does not get around the possibility of a bad luck worst case.
> 
>   Perfect hashing does exactly that.  That's why it's "popular for hashing
> keywords for compilers", and indeed why it "ought to be popular for optimizing
> switch statements":

I still doubt that it is worth while in practice, it will be interesting 
to see whether actual measurements show otherwise. The experience with
the BCPL compiler was that in practice though clever, this approach was
not actually helpful in real programs.

I can't see perfect hashing EVER being useful for hashing keywords for
compilers, since in practice given good error detection you don't know
whether you are looking for a keyword or an identifier in any context.
So almost always you just incorporate the hash entries for keywords into
the main identifier hash table. Are there really compilers which work
any other way? Given that you have to treat nicely the case of 
programmers using keywords as identifiers accidentally, and misspelling
keywords accidentally:

>      1. pakage X is
>         |
>         >>> incorrect spelling of keyword "package"
> 
>      2.    type Package is range 1 .. 10;
>                 |
>         >>> reserved word "package" cannot be used as identifier
> 
>      3. end X;

I really don't see any other approach

(hmmm, now I have to file a bug report about the inconsistent
terminology of keyword vs reserved word :-))


> 
> http://burtleburtle.net/bob/hash/perfect.html
> 
>     cheers,
>       DaveK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 22:57                   ` Jae Hyuk Kwak
  2010-03-19 23:33                     ` Robert Dewar
@ 2010-03-19 23:33                     ` Robert Dewar
  1 sibling, 0 replies; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 23:33 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: gcc

Jae Hyuk Kwak wrote:

> Now we have a switch statement that has 400 cases or whatever.
> 
> switch ( valueFormUser )
> {
>    case 0: do1(); break;
>    case 1: do2(); break;
>    case 2: do3(); break;
>    ...
>    case 399: do399(); break;
>    default: break;

Well clearly if you have 400 cases like this, perfect hashing
is NOT going to help, since a simple jump table will help.

The case where perfect hashing would help is when there are
a bunch of values that are sparse and somewhat random in
nature, which is rare in practice.

Very often, even sparse cases have ranges which can be quickly
separated and then handled with jump tables. In any case the
most efficient way of handling arbitrary big cases will be some
kind of divide and conquer approach where you use the appropriate
algorithm at each stage.

Clearly you don't want to use a perfect hash for a case that
has cases 0 .. 500  and  55550000 .. 55550999, since simple if tests
will be better in this case.

on the other hand if you have a bunch of 256 values, and you find
that they are distinct in the last 8 bits, then the perfect hash
is just a single and, and in this case clearly the best approach
would be to do the single AND with a jump table. But it is hard
to believe that such a case would occur in any other context than
a test to test this particular clever optimization :-)
> 
> In the case the conflict happened, we change the value of "c2" until
> none of values of cases conflict.
> In other words, we change c2 value until we get "hash32shiftmult( 0,
> c2 ) != hash32shiftmult( 1, c2 ) != hash32shiftmult( 2, c2 )...."
> 
> Therefore we will have "Perfect hash table", so that there is no
> conflict during run-time.
> In other words, we will have fixed constant cost for hash table
> resolving; there is no 'hash badly" case.
> 
> In fact, it will make compile time longer, if it is your interest.
> But again, I am talking about run-time speed.
> 
> 
>> A quote from this article:
>>
>>> " ... the switch very efficiently, even constructing a hash table if this
>>> method of switching is ..."
> 
> Yes, it does. Such a old paper mentioned it and we are still not doing it.
> So it makes me wonder why.
> 
> Thanks
> 
> Jay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-19 22:57                   ` Jae Hyuk Kwak
@ 2010-03-19 23:33                     ` Robert Dewar
  2010-03-19 23:33                     ` Robert Dewar
  1 sibling, 0 replies; 27+ messages in thread
From: Robert Dewar @ 2010-03-19 23:33 UTC (permalink / raw)
  To: Jae Hyuk Kwak; +Cc: gcc

Jae Hyuk Kwak wrote:

>> A quote from this article:
>>
>>> " ... the switch very efficiently, even constructing a hash table if this
>>> method of switching is ..."
> 
> Yes, it does. Such a old paper mentioned it and we are still not doing it.
> So it makes me wonder why.

Why? Perhaps because it is not helpful in practice ???

Perfect hashing is even older, first referenced in the early 1950's not 
sure of exact date, but some time around then.

As I say, I am pretty sure that BCPL did use perfect hashing, but you 
still have to show that it is helpful in practice. That's really the
only resolution of the game of guessing whether it is worthwhile.

What you need to do is to implement the perfect hashing possibility,
then run the modified compiler over a big chunk of real life code,
and see if you find any significant improvements (or degradation).
You might be able to concoct an artificial example where it helps,
but that's never a convincing way of justifying an optimization
(though it is all too commonly done :-))

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-22 13:22 ` Robert Dewar
@ 2010-03-22 16:29   ` Andrew Haley
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Haley @ 2010-03-22 16:29 UTC (permalink / raw)
  Cc: gcc

On 03/22/2010 12:43 PM, Robert Dewar wrote:

> the code for computing the hash has to be taken into account, but
> nothing else than actual benchmarks will give an accurate
> comparison.

I agree.  I'd also like to point out that as it is not very difficult
to do these benchmarks, the proponent(s) should produce some numbers
before much more discussion takes place.

Andrew.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
  2010-03-22 12:44 Unruh, Erwin
@ 2010-03-22 13:22 ` Robert Dewar
  2010-03-22 16:29   ` Andrew Haley
  0 siblings, 1 reply; 27+ messages in thread
From: Robert Dewar @ 2010-03-22 13:22 UTC (permalink / raw)
  To: Unruh, Erwin; +Cc: gcc

Unruh, Erwin wrote:
> Hi,
> 
> the discussion so far did omit one specific aspect. When comparing two implementations for a switch, you have to compare the full code. For the hash you have to include the code to calculate the hash function. This might be more code than a simple tree lookup.
> The example function:
> 
>> public int hash32shift(int key)
>> {
>>  key = ~key + (key << 15); // key = (key << 15) - key - 1;
>>  key = key ^ (key >>> 12);
>>  key = key + (key << 2);
>>  key = key ^ (key >>> 4);
>>  key = key * 2057; // key = (key + (key << 3)) + (key << 11);
>>  key = key ^ (key >>> 16);
>>  return key;
>> }
> 
> has 12 operations. Add a table and verification you get to about 18. That is worse than a tree search with 9 levels. So for all switches with less than 512 elements, the hash is not faster.

You can't just count operations on modern machines, so this conclusion 
is not valid, yes, of course the code for computing the hash has to be
taken into account, but nothing else than actual benchmarks will give
an accurate comparison.
> 
> 	Erwin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hash Function for "switch statement"
@ 2010-03-22 12:44 Unruh, Erwin
  2010-03-22 13:22 ` Robert Dewar
  0 siblings, 1 reply; 27+ messages in thread
From: Unruh, Erwin @ 2010-03-22 12:44 UTC (permalink / raw)
  To: gcc

Hi,

the discussion so far did omit one specific aspect. When comparing two implementations for a switch, you have to compare the full code. For the hash you have to include the code to calculate the hash function. This might be more code than a simple tree lookup.
The example function:

>public int hash32shift(int key)
>{
>  key = ~key + (key << 15); // key = (key << 15) - key - 1;
>  key = key ^ (key >>> 12);
>  key = key + (key << 2);
>  key = key ^ (key >>> 4);
>  key = key * 2057; // key = (key + (key << 3)) + (key << 11);
>  key = key ^ (key >>> 16);
>  return key;
>}

has 12 operations. Add a table and verification you get to about 18. That is worse than a tree search with 9 levels. So for all switches with less than 512 elements, the hash is not faster.

	Erwin

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-03-22 13:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-14 20:48 Hash Function for "switch statement" Jae Hyuk Kwak
2010-03-15  3:12 ` Basile Starynkevitch
2010-03-15  8:00   ` Jae Hyuk Kwak
2010-03-16  1:41     ` Jae Hyuk Kwak
2010-03-16  7:12       ` Basile Starynkevitch
2010-03-16  6:00     ` Dave Korn
     [not found] ` <20100317200410.GA13807@hungry-tiger.westford.ibm.com>
2010-03-18  6:39   ` Jae Hyuk Kwak
2010-03-18 11:20     ` Andrew Haley
     [not found]     ` <20100318151753.GA4065@hungry-tiger.westford.ibm.com>
2010-03-19  5:26       ` Jae Hyuk Kwak
2010-03-19 12:26         ` Frank Ch. Eigler
2010-03-19 18:11         ` Jae Hyuk Kwak
     [not found]         ` <20100319165443.GA9993@hungry-tiger.westford.ibm.com>
2010-03-19 21:26           ` Jae Hyuk Kwak
2010-03-19 21:30             ` Robert Dewar
2010-03-19 21:53               ` Jae Hyuk Kwak
2010-03-19 22:18                 ` Robert Dewar
2010-03-19 22:43                   ` Dave Korn
2010-03-19 23:28                     ` Robert Dewar
2010-03-19 22:57                   ` Jae Hyuk Kwak
2010-03-19 23:33                     ` Robert Dewar
2010-03-19 23:33                     ` Robert Dewar
2010-03-19 22:24               ` Dave Korn
2010-03-19 23:09                 ` Robert Dewar
2010-03-19 23:17                   ` Robert Dewar
2010-03-19 19:14     ` Andrew Haley
2010-03-22 12:44 Unruh, Erwin
2010-03-22 13:22 ` Robert Dewar
2010-03-22 16:29   ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).