From: Noah Goldstein <goldstein.w.n@gmail.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
"Carlos O'Donell" <carlos@systemhalted.org>,
Alexander Monakov <amonakov@ispras.ru>
Subject: Re: [PATCH v7 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h
Date: Tue, 10 May 2022 22:07:42 -0500 [thread overview]
Message-ID: <CAFUsyf+o1SW9LG2nYE0hLkR=jJA9JkUJNdzJkdjBzTehkdjP+w@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOquyvg3KBZQg3UWdZYBw+y+bx9J8X-LQRHhi8SL04NwQg@mail.gmail.com>
On Tue, May 10, 2022 at 6:47 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, May 10, 2022 at 4:30 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > Unroll slightly and enforce good instruction scheduling. This improves
> > performance on out-of-order machines. Note the unrolling allows
> > for pipelined multiplies which helps a bit, but most of the gain
> > is from enforcing better instruction scheduling for more ILP.
> > Unrolling further started to induce slowdowns for sizes [0, 4]
> > but can help the loop so if larger sizes are the target further
> > unrolling can be beneficial.
> >
> > Results for _dl_new_hash
> > Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
> >
> > Time as Geometric Mean of N=30 runs
> > Geometric of all benchmark New / Old: 0.674
> > type, length, New Time, Old Time, New Time / Old Time
> > fixed, 0, 2.865, 2.72, 1.053
> > fixed, 1, 3.567, 2.489, 1.433
> > fixed, 2, 2.577, 3.649, 0.706
> > fixed, 3, 3.644, 5.983, 0.609
> > fixed, 4, 4.211, 6.833, 0.616
> > fixed, 5, 4.741, 9.372, 0.506
> > fixed, 6, 5.415, 9.561, 0.566
> > fixed, 7, 6.649, 10.789, 0.616
> > fixed, 8, 8.081, 11.808, 0.684
> > fixed, 9, 8.427, 12.935, 0.651
> > fixed, 10, 8.673, 14.134, 0.614
> > fixed, 11, 10.69, 15.408, 0.694
> > fixed, 12, 10.789, 16.982, 0.635
> > fixed, 13, 12.169, 18.411, 0.661
> > fixed, 14, 12.659, 19.914, 0.636
> > fixed, 15, 13.526, 21.541, 0.628
> > fixed, 16, 14.211, 23.088, 0.616
> > fixed, 32, 29.412, 52.722, 0.558
> > fixed, 64, 65.41, 142.351, 0.459
> > fixed, 128, 138.505, 295.625, 0.469
> > fixed, 256, 291.707, 601.983, 0.485
> > random, 2, 12.698, 12.849, 0.988
> > random, 4, 16.065, 15.857, 1.013
> > random, 8, 19.564, 21.105, 0.927
> > random, 16, 23.919, 26.823, 0.892
> > random, 32, 31.987, 39.591, 0.808
> > random, 64, 49.282, 71.487, 0.689
> > random, 128, 82.23, 145.364, 0.566
> > random, 256, 152.209, 298.434, 0.51
> >
> > Co-authored-by: Alexander Monakov <amonakov@ispras.ru>
> > ---
> > elf/dl-new-hash.h | 66 +++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 61 insertions(+), 5 deletions(-)
> >
> > diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h
> > index 40d88c81f9..c6d96a0452 100644
> > --- a/elf/dl-new-hash.h
> > +++ b/elf/dl-new-hash.h
> > @@ -20,15 +20,71 @@
> > #define _DL_NEW_HASH_H 1
> >
> > #include <stdint.h>
> > +/* For __glibc_unlikely. */
> > +#include <sys/cdefs.h>
> > +
> > +/* The simplest implementation of _dl_new_hash is:
> > +
> > +_dl_new_hash (const char *s)
> > +{
> > + uint32_t h = 5381;
> > + for (unsigned char c = *s; c != '\0'; c = *++s)
> > + h = h * 33 + c;
> > + return h;
> > +}
> > +
> > + We can get better performance, however, by slightly unrolling the
> > + loop and explicitly specifying order of operations to prevent
> > + reassosiation of instructions and ensure ideal scheduling. */
> >
> > static inline uint32_t
> > __attribute__ ((unused))
> > -_dl_new_hash (const char *s)
> > +_dl_new_hash (const char *str)
> > {
> > - uint32_t h = 5381;
> > - for (unsigned char c = *s; c != '\0'; c = *++s)
> > - h = h * 33 + c;
> > - return h;
> > + const unsigned char *s = (const unsigned char *) str;
> > + unsigned int h = 5381;
> > + unsigned int c0, c1;
> > + for (;;)
> > + {
> > + c0 = (unsigned int) *s;
>
> I prefer
>
> c0 = s[0];
>
> There is no need for case since "s" is a pointer to unsigned char.
Fixed in V8.
>
> > + /* Unlikely length zero string so evens will be slightly less
> even
> > + common. */
> > + if (__glibc_unlikely (c0 == 0))
> > + return h;
> > +
> > + c1 = (unsigned int) *(s + 1);
>
> c1 = s[1];
Fixed in V8.
>
> > + if (c1 == 0)
> > + {
> > + c0 += h;
> > + /* Ideal instruction scheduling is:
> > + c0 += h;
> > + h *= 32;
> > + h += c0;
> > +
> > + The asm statements are to prevent reassosiation that would result in
> > + more instruction interdependencies and worse scheduling. */
> > + asm("" : "+r"(h) : "r"(c0));
> asm ( << A space after asm.
> > + h = h * 32 + c0;
> > + return h;
> > + }
> > +
> > + /* Ideal instruction scheduling is:
> > + c1 += c0;
> > + h *= 33 * 33;
> > + c0 *= 32;
> > + c1 += c0;
> > + h += c1;
> > +
> > + The asm statements are to prevent reassosiation that would result in
> > + more instruction interdependencies and worse scheduling. */
> > + c1 += c0;
> > + asm("" : "+r"(c1), "+r"(c0));
> asm ( << A space after asm.
> > + h *= 33 * 33;
> > + c1 += c0 * 32;
> > + asm("" : "+r"(c1));
> asm ( << A space after asm.
> > + h += c1;
> > + s += 2;
> > + }
> > }
>
> This is faster on x86. Is this also true on other targets? Should it be moved
> to sysdeps/generic so that other targets can provide a different version?
I haven't tested any other targets but the optimizations are pretty generic so
would imagine similar results on other architectures
>
> >
> > --
> > 2.34.1
> >
>
>
> --
> H.J.
next prev parent reply other threads:[~2022-05-11 3:07 UTC|newest]
Thread overview: 167+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-14 4:12 [PATCH v1 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-04-14 4:12 ` [PATCH v1 2/6] elf: Add tests for the hash functions in dl-hash.h Noah Goldstein
2022-04-14 4:12 ` [PATCH v1 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-14 4:12 ` [PATCH v1 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-14 4:12 ` [PATCH v1 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-14 4:12 ` [PATCH v1 6/6] elf: Optimize __dl_new_hash in dl-hash.h Noah Goldstein
2022-04-14 4:32 ` [PATCH v1 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked H.J. Lu
2022-04-14 14:56 ` Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 " Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-25 15:39 ` Florian Weimer
2022-04-25 15:59 ` Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-25 15:38 ` Florian Weimer
2022-04-25 15:58 ` Noah Goldstein
2022-04-26 8:35 ` Florian Weimer
2022-04-26 21:39 ` Noah Goldstein
2022-04-27 10:48 ` Florian Weimer
2022-04-27 15:02 ` Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-04-25 16:01 ` [PATCH v3 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Adhemerval Zanella
2022-04-25 16:18 ` Noah Goldstein
2022-04-25 15:59 ` [PATCH v1 " Adhemerval Zanella
2022-04-25 16:16 ` Noah Goldstein
2022-04-25 16:35 ` [PATCH v3 " Noah Goldstein
2022-04-25 16:35 ` [PATCH v3 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-25 16:35 ` [PATCH v3 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-27 10:39 ` Florian Weimer
2022-04-27 16:24 ` Noah Goldstein
2022-04-25 16:35 ` [PATCH v3 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-25 16:36 ` [PATCH v3 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-27 10:47 ` Florian Weimer
2022-04-25 16:36 ` [PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-04-27 10:43 ` Florian Weimer
2022-04-27 16:25 ` Noah Goldstein
2022-04-27 15:02 ` Alexander Monakov
[not found] ` <CAFUsyfKeocq4VAusvnggq-NR=tOQTjrD0Z6r3CYCTjGQ=tGGSw@mail.gmail.com>
[not found] ` <f54f1ec9-fc31-283f-bce9-59fd8bda98ad@ispras.ru>
2022-04-27 16:23 ` Noah Goldstein
2022-04-28 18:03 ` Alexander Monakov
2022-05-04 18:04 ` Alexander Monakov
2022-05-05 11:07 ` Alexander Monakov
2022-05-05 15:10 ` Noah Goldstein
2022-05-05 15:26 ` Alexander Monakov
2022-05-05 18:03 ` Noah Goldstein
2022-05-05 19:37 ` Alexander Monakov
2022-05-05 22:51 ` Noah Goldstein
2022-04-27 16:19 ` [PATCH v4 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-04-27 16:19 ` [PATCH v4 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-27 16:19 ` [PATCH v4 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-27 16:20 ` [PATCH v4 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-27 16:20 ` [PATCH v4 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-27 16:20 ` [PATCH v4 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-10 11:58 ` Adhemerval Zanella
2022-05-10 15:04 ` [PATCH v6 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-10 15:04 ` [PATCH v6 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-10 15:04 ` [PATCH v6 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-10 15:04 ` [PATCH v6 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-10 15:04 ` [PATCH v6 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-10 15:04 ` [PATCH v6 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-10 15:29 ` H.J. Lu
2022-05-10 15:31 ` H.J. Lu
2022-05-10 16:49 ` Alexander Monakov
2022-05-10 17:17 ` Noah Goldstein
2022-05-10 17:40 ` Alexander Monakov
2022-05-10 23:30 ` [PATCH v7 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-10 23:30 ` [PATCH v7 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-10 23:30 ` [PATCH v7 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-10 23:30 ` [PATCH v7 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-10 23:30 ` [PATCH v7 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-10 23:30 ` [PATCH v7 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-10 23:46 ` H.J. Lu
2022-05-11 3:07 ` Noah Goldstein [this message]
2022-05-11 3:06 ` [PATCH v8 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-11 3:06 ` [PATCH v8 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-11 3:06 ` [PATCH v8 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-11 3:06 ` [PATCH v8 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-11 3:06 ` [PATCH v8 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-11 3:06 ` [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-16 14:12 ` Siddhesh Poyarekar
2022-05-16 14:31 ` Alexander Monakov
2022-05-16 16:23 ` Siddhesh Poyarekar
2022-05-16 16:38 ` Noah Goldstein
2022-05-16 16:44 ` Siddhesh Poyarekar
2022-05-16 20:32 ` Noah Goldstein
2022-05-16 18:09 ` Alexander Monakov
2022-05-16 18:47 ` Siddhesh Poyarekar
2022-05-16 19:28 ` Alexander Monakov
2022-05-16 19:35 ` Noah Goldstein
2022-05-16 19:41 ` Alexander Monakov
2022-05-16 19:47 ` Adhemerval Zanella
2022-05-16 20:00 ` Alexander Monakov
2022-05-16 20:08 ` Adhemerval Zanella
2022-05-16 20:27 ` Alexander Monakov
2022-05-16 19:48 ` Noah Goldstein
2022-05-16 20:33 ` Alexander Monakov
2022-05-16 21:40 ` Noah Goldstein
2022-05-17 1:45 ` Siddhesh Poyarekar
2022-05-16 13:56 ` [PATCH v8 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Siddhesh Poyarekar
2022-05-16 20:31 ` Noah Goldstein
2022-05-16 20:29 ` [PATCH v9 " Noah Goldstein
2022-05-16 20:30 ` [PATCH v9 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-17 4:19 ` Siddhesh Poyarekar
2022-05-18 17:29 ` Noah Goldstein
2022-05-16 20:30 ` [PATCH v9 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-17 4:32 ` Siddhesh Poyarekar
2022-05-18 17:30 ` Noah Goldstein
2022-05-16 20:30 ` [PATCH v9 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-17 4:52 ` Siddhesh Poyarekar
2022-05-18 17:33 ` Noah Goldstein
2022-05-16 20:30 ` [PATCH v9 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-17 5:11 ` Siddhesh Poyarekar
2022-05-18 17:34 ` Noah Goldstein
2022-05-18 17:35 ` Noah Goldstein
2022-05-16 20:30 ` [PATCH v9 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-17 5:12 ` Siddhesh Poyarekar
2022-05-18 17:38 ` Noah Goldstein
2022-05-19 15:59 ` Siddhesh Poyarekar
2022-05-19 16:54 ` DJ Delorie
2022-05-17 3:34 ` [PATCH v9 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Siddhesh Poyarekar
2022-05-18 17:28 ` Noah Goldstein
2022-05-18 17:26 ` [PATCH v10 " Noah Goldstein
2022-05-18 17:26 ` [PATCH v10 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-19 14:49 ` Siddhesh Poyarekar
2022-05-18 17:26 ` [PATCH v10 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-19 15:09 ` Siddhesh Poyarekar
2022-05-19 15:40 ` Siddhesh Poyarekar
2022-05-19 22:20 ` Noah Goldstein
2022-05-18 17:26 ` [PATCH v10 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-19 15:34 ` Siddhesh Poyarekar
2022-05-19 22:20 ` Noah Goldstein
2022-05-18 17:26 ` [PATCH v10 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-19 15:41 ` Siddhesh Poyarekar
2022-05-19 22:21 ` Noah Goldstein
2022-05-18 17:26 ` [PATCH v10 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-18 17:32 ` H.J. Lu
2022-05-18 17:39 ` Noah Goldstein
2022-05-19 7:53 ` Siddhesh Poyarekar
2022-05-19 15:55 ` Siddhesh Poyarekar
2022-05-19 22:22 ` Noah Goldstein
2022-05-19 14:47 ` [PATCH v10 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Siddhesh Poyarekar
2022-05-19 14:50 ` Noah Goldstein
2022-05-19 14:56 ` Siddhesh Poyarekar
2022-05-19 22:17 ` [PATCH v11 " Noah Goldstein
2022-05-19 22:17 ` [PATCH v11 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-19 22:19 ` Noah Goldstein
2022-05-19 22:18 ` [PATCH v11 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-23 7:42 ` Siddhesh Poyarekar
2022-05-19 22:18 ` [PATCH v11 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-23 7:44 ` Siddhesh Poyarekar
2022-05-19 22:18 ` [PATCH v11 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-23 7:44 ` Siddhesh Poyarekar
2022-05-19 22:18 ` [PATCH v11 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-23 7:46 ` Siddhesh Poyarekar
2022-05-19 22:18 ` [PATCH v11 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFUsyf+o1SW9LG2nYE0hLkR=jJA9JkUJNdzJkdjBzTehkdjP+w@mail.gmail.com' \
--to=goldstein.w.n@gmail.com \
--cc=amonakov@ispras.ru \
--cc=carlos@systemhalted.org \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).