From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25124 invoked by alias); 5 Jul 2005 19:24:20 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 25080 invoked by uid 22791); 5 Jul 2005 19:24:16 -0000 Received: from smtp-102-tuesday.noc.nerim.net (HELO mallaury.nerim.net) (62.4.17.102) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Tue, 05 Jul 2005 19:24:16 +0000 Received: from uniton.integrable-solutions.net (gdr.net1.nerim.net [62.212.99.186]) by mallaury.nerim.net (Postfix) with ESMTP id 7CC064F3AA; Tue, 5 Jul 2005 21:24:02 +0200 (CEST) Received: from uniton.integrable-solutions.net (localhost [127.0.0.1]) by uniton.integrable-solutions.net (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j65JN6KY007529; Tue, 5 Jul 2005 21:23:06 +0200 Received: (from gdr@localhost) by uniton.integrable-solutions.net (8.12.10/8.12.10/Submit) id j65JN6wD007528; Tue, 5 Jul 2005 21:23:06 +0200 To: Michael Veksler Cc: Joe Buck , gcc@gcc.gnu.org Subject: Re: tr1::unordered_set bizarre rounding behavior (x86) References: From: Gabriel Dos Reis In-Reply-To: Date: Tue, 05 Jul 2005 19:24:00 -0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2005-07/txt/msg00189.txt.bz2 Michael Veksler writes: | Joe Buck wrote on 05/07/2005 21:10:25: | | > On Tue, Jul 05, 2005 at 08:05:39PM +0200, Gabriel Dos Reis wrote: | > > It is definitely a good thing to use the full bits of value | > > representation if we ever want to make all "interesting" bits part of | > > the hash value. For reasonable or sane representations it suffices to | > > get your hand on the object representation, e.g.: | > > | > > const int objsize = sizeof (double); | > > typedef unsigned char objrep_t[objsize]; | > > double x = ....; | > > objrep_t& p = reintepret_cast(x); | > > // ... | > > | > > and let frexp and friends only for less obvious value representation. | > | > I disagree; on an ILP32 machine, we pull out only 32 bits for the hash | > value, and if you aren't careful, your approach will wind up using the | > least significant bits of the mantissa. This will cause all values that | > are exactly representable as floats to collide. | | For that you can do something like (or templated equivalent): | namespace Impl | { | template | size_t floating_point_hash(T in) | { | if (sizeof(in) <= sizeof(size_t)) | Use Gaby's solution, with zero padding; | else | frexp and friends using Joe Buck's ideas; | } | } | | Gaby's solution should be done with care - to avoid any | aliasing issues (never go directly from double& to size_t&). The standard explicilty permit that you can regard any object as an array of unsigned char. Given that, and given no padding bits (e.g. the "sane" representation assumption), hashing any object larger than a size_t is no different from hashing a character string. Now, the question is how to make sure we do not have padding bits. For most targets, that assumption is OK; only the one subject of this discussion seems to pose problems ;-) | Both Gaby's and Joe Buck's solutions do not take | the strangeness of IEEE (NNN?) into account. | As I remember it (I don't have the reference at home), | IEEE FP has many bit-representations for NaN, each | containing some bit-encoding of errors. My proposal explicilty takes that into account in the sense that it looks at all bits of the value representation, therefore the encoding bits of the NaNs too. | "There *should* be a specialization for equal_to that | provides a strict weak ordering for NaNs as well as other | values." [quoted forwarded mail from P.J. Plauger] | Doing bit-wise conversions will not address this requirement. I do not understand what you mean by that. -- Gaby