public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "witold.baryluk+sourceware at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug libc/25924] Very poor choice of hash function in hsearch
Date: Wed, 06 May 2020 20:40:37 +0000	[thread overview]
Message-ID: <bug-25924-131-ncaaZbUIcw@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-25924-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=25924

--- Comment #4 from Witold Baryluk <witold.baryluk+sourceware at gmail dot com> ---
I did a bit more of benchmarking on amd64, (AMD Threadripper TR 2950X), and
excluded the time for malloc / snprintf. I also instrumented the while loop in
hsearch_r to count collisions during insertion (ENTER). 20M insertions into
table of size 30M. Wall clock time - best of 5 runs.

old hash:
dec_keys  1.144s   6149640 collisions
hex_keys  3.319s  87011211 collisions

new hash, fasthash64:
dec_keys  1.146s   2313573 collisions
hex_keys  1.170s   2312409 collisions

It is just an example. But you can see new hash (or other good hash) has
significantly less collisions, and "%d" and "%x" style keys having essentially
same (and fast) performance and no collisions blow-up.

I did not measure the lookup (FIND), but considering number of collisions with
existing hash, I expect old hash to perform very bad.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2020-05-06 20:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 14:05 [Bug libc/25924] New: " witold.baryluk+sourceware at gmail dot com
2020-05-05 15:04 ` [Bug libc/25924] " witold.baryluk+sourceware at gmail dot com
2020-05-05 19:08 ` carlos at redhat dot com
2020-05-06 20:36 ` witold.baryluk+sourceware at gmail dot com
2020-05-06 20:40 ` witold.baryluk+sourceware at gmail dot com [this message]
2020-05-06 21:04 ` carlos at redhat dot com
2020-05-06 21:08 ` witold.baryluk+sourceware at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-25924-131-ncaaZbUIcw@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).