From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27236 invoked by alias); 19 Mar 2010 22:55:37 -0000 Received: (qmail 27143 invoked by uid 22791); 19 Mar 2010 22:55:34 -0000 X-SWARE-Spam-Status: No, hits=-0.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_NEUTRAL X-Spam-Check-By: sourceware.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (140.186.70.10) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 19 Mar 2010 22:55:24 +0000 Received: from mail.gnu.org ([199.232.76.166]:44927 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Nsl6A-0001Sn-LU for gcc@gnu.org; Fri, 19 Mar 2010 18:55:22 -0400 Received: from eggs.gnu.org ([140.186.70.92]:47640) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Nsl0D-0001cX-D0 for gcc@gnu.org; Fri, 19 Mar 2010 18:49:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Nsl0B-00077H-Sl for gcc@gnu.org; Fri, 19 Mar 2010 18:49:13 -0400 Received: from mail-iw0-f175.google.com ([209.85.223.175]:59538) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Nsl0B-000778-Pn for gcc@gnu.org; Fri, 19 Mar 2010 18:49:11 -0400 Received: by iwn5 with SMTP id 5so2784131iwn.24 for ; Fri, 19 Mar 2010 15:49:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.168.139 with SMTP id u11mr433611iby.46.1269038950982; Fri, 19 Mar 2010 15:49:10 -0700 (PDT) In-Reply-To: <4BA3F5AD.8070600@adacore.com> References: <8ac33cd71003141343g7ae78185s378fd52205e2deb1@mail.gmail.com> <20100317200410.GA13807@hungry-tiger.westford.ibm.com> <8ac33cd71003172222i3be31fbah209f49bd83dfa9b3@mail.gmail.com> <20100318151753.GA4065@hungry-tiger.westford.ibm.com> <8ac33cd71003181710k263f6a8do6abd49d80e5213f8@mail.gmail.com> <20100319165443.GA9993@hungry-tiger.westford.ibm.com> <8ac33cd71003191358o15ad8e72tb14dc71adfea2066@mail.gmail.com> <4BA3E8EB.3070100@adacore.com> <8ac33cd71003191430w7f344f95x4541ca41609c33e3@mail.gmail.com> <4BA3F5AD.8070600@adacore.com> Date: Fri, 19 Mar 2010 22:57:00 -0000 Message-ID: <8ac33cd71003191549k37e3d360n174c18e93d80ffdc@mail.gmail.com> Subject: Re: Hash Function for "switch statement" From: Jae Hyuk Kwak To: Robert Dewar Cc: gcc@gnu.org Content-Type: text/plain; charset=UTF-8 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2010-03/txt/msg00321.txt.bz2 Thanks for the fast reply, On Fri, Mar 19, 2010 at 3:07 PM, Robert Dewar wrote: >> "In many situations, hash tables turn out to be more efficient than >> search trees or any other table lookup structure." > > The above quote from Wikipedia is indeed misleading because it does > not make a clearer contrast between average behavior and worst case > behavior (the situation is similar to comparing quick sort with heap > sort, they are both NlogN on average, but the constant is smaller > for quick sort, so it is generally better, but the worst case is > N**2 for quick sort and NlogN for heap sort. > > So this does not get around the possibility of a bad luck worst case. > It is one thing for a program to compiler slowly because by bad luck > it has identifier names that hash badly, it is QUITE another to hit > bad luck at execution time which results in a particular program > running horribly slow. I am dubious about introducing this kind of > uncertainty. I am not interested in making compile time shorter. I am here talking about the speed at run-time not at compile-time. I have already mentioned it earlier. I assume that by "hash badly" you mean a hash function generates an identical result even from different key values. For the hash table as alternative way of switch statement, it does not have "hash badly" situation at run-time. It is because the hash table I'm talking about is a perfect hash table. Let's say we adopt this hash function. It is also from the same web site. public int hash32shiftmult(int key, int c2=0x27d4eb2d ) { key = (key ^ 61) ^ (key >>> 16); key = key + (key << 3); key = key ^ (key >>> 4); key = key * c2; key = key ^ (key >>> 15); return key; } Now we have a switch statement that has 400 cases or whatever. switch ( valueFormUser ) { case 0: do1(); break; case 1: do2(); break; case 2: do3(); break; ... case 399: do399(); break; default: break; } In the case the conflict happened, we change the value of "c2" until none of values of cases conflict. In other words, we change c2 value until we get "hash32shiftmult( 0, c2 ) != hash32shiftmult( 1, c2 ) != hash32shiftmult( 2, c2 )...." Therefore we will have "Perfect hash table", so that there is no conflict during run-time. In other words, we will have fixed constant cost for hash table resolving; there is no 'hash badly" case. In fact, it will make compile time longer, if it is your interest. But again, I am talking about run-time speed. > A quote from this article: > >> " ... the switch very efficiently, even constructing a hash table if this >> method of switching is ..." Yes, it does. Such a old paper mentioned it and we are still not doing it. So it makes me wonder why. Thanks Jay