public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Siddhesh Poyarekar <siddhesh@gotplt.org>
To: Noah Goldstein <goldstein.w.n@gmail.com>, libc-alpha@sourceware.org
Cc: Alexander Monakov <amonakov@ispras.ru>
Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h
Date: Mon, 16 May 2022 19:42:27 +0530	[thread overview]
Message-ID: <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> (raw)
In-Reply-To: <20220511030635.154689-6-goldstein.w.n@gmail.com>

On 11/05/2022 08:36, Noah Goldstein via Libc-alpha wrote:
> Unroll slightly and enforce good instruction scheduling. This improves
> performance on out-of-order machines. Note the unrolling allows
> for pipelined multiplies which helps a bit, but most of the gain
> is from enforcing better instruction scheduling for more ILP.
> Unrolling further started to induce slowdowns for sizes [0, 4]
> but can help the loop so if larger sizes are the target further
> unrolling can be beneficial.
> 
> Results for _dl_new_hash
> Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
> 
> Time as Geometric Mean of N=30 runs
> Geometric of all benchmark New / Old: 0.674
>    type, length, New Time, Old Time, New Time / Old Time
>   fixed,      0,    2.865,     2.72,               1.053
>   fixed,      1,    3.567,    2.489,               1.433
>   fixed,      2,    2.577,    3.649,               0.706
>   fixed,      3,    3.644,    5.983,               0.609
>   fixed,      4,    4.211,    6.833,               0.616
>   fixed,      5,    4.741,    9.372,               0.506
>   fixed,      6,    5.415,    9.561,               0.566
>   fixed,      7,    6.649,   10.789,               0.616
>   fixed,      8,    8.081,   11.808,               0.684
>   fixed,      9,    8.427,   12.935,               0.651
>   fixed,     10,    8.673,   14.134,               0.614
>   fixed,     11,    10.69,   15.408,               0.694
>   fixed,     12,   10.789,   16.982,               0.635
>   fixed,     13,   12.169,   18.411,               0.661
>   fixed,     14,   12.659,   19.914,               0.636
>   fixed,     15,   13.526,   21.541,               0.628
>   fixed,     16,   14.211,   23.088,               0.616
>   fixed,     32,   29.412,   52.722,               0.558
>   fixed,     64,    65.41,  142.351,               0.459
>   fixed,    128,  138.505,  295.625,               0.469
>   fixed,    256,  291.707,  601.983,               0.485
> random,      2,   12.698,   12.849,               0.988
> random,      4,   16.065,   15.857,               1.013
> random,      8,   19.564,   21.105,               0.927
> random,     16,   23.919,   26.823,               0.892
> random,     32,   31.987,   39.591,               0.808
> random,     64,   49.282,   71.487,               0.689
> random,    128,    82.23,  145.364,               0.566
> random,    256,  152.209,  298.434,                0.51
> 
> Co-authored-by: Alexander Monakov <amonakov@ispras.ru>
> ---
>   elf/dl-new-hash.h | 66 +++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 61 insertions(+), 5 deletions(-)
> 
> diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h
> index 40d88c81f9..5269f6eb98 100644
> --- a/elf/dl-new-hash.h
> +++ b/elf/dl-new-hash.h
> @@ -20,15 +20,71 @@
>   #define _DL_NEW_HASH_H 1
>   
>   #include <stdint.h>
> +/* For __glibc_unlikely.  */
> +#include <sys/cdefs.h>
> +
> +/* The simplest implementation of _dl_new_hash is:
> +
> +_dl_new_hash (const char *s)
> +{
> +   uint32_t h = 5381;
> +   for (unsigned char c = *s; c != '\0'; c = *++s)
> +     h = h * 33 + c;
> +   return h;
> +}
> +
> +   We can get better performance, however, by slightly unrolling the
> +   loop and explicitly specifying order of operations to prevent
> +   reassosiation of instructions and ensure ideal scheduling.  */
>   
>   static inline uint32_t
>   __attribute__ ((unused))
> -_dl_new_hash (const char *s)
> +_dl_new_hash (const char *str)
>   {
> -  uint32_t h = 5381;
> -  for (unsigned char c = *s; c != '\0'; c = *++s)
> -    h = h * 33 + c;
> -  return h;
> +  const unsigned char *s = (const unsigned char *) str;
> +  unsigned int h = 5381;
> +  unsigned int c0, c1;
> +  for (;;)
> +    {
> +      c0 = s[0];
> +      /* Unlikely length zero string so evens will be slightly less
> +	 common.  */
> +      if (__glibc_unlikely (c0 == 0))
> +	return h;
> +
> +      c1 = s[1];
> +      if (c1 == 0)
> +	{
> +	  c0 += h;
> +	  /* Ideal instruction scheduling is:
> +	 c0 += h;
> +	 h *= 32;
> +	 h += c0;
> +
> +	 The asm statements are to prevent reassosiation that would result in
> +	 more instruction interdependencies and worse scheduling.  */
> +	  asm("" : "+r"(h) : "r"(c0));

There are a couple of things that seem problematic to me about this:

- It seems like we're trying to fix a gcc issue in glibc.  Couldn't we 
file a gcc bug and explore ways in which this could be supported in the 
compiler?  In fact, it might make sense to do that for the original 
loop; it looks like a missed optimization that gcc ought to fix.  IMO 
the bug should be filed even if we do end up with this 
micro-optimization in glibc.

- The patch controls an instruction schedule so that it works well on 
out-of-order processors but then only quoting one microarchitecture.  If 
it works well on TigerLake (and on x86 in general) then it might be 
better to add it as a sysdep override; I assumed that was the point of 
breaking the function out into its header anyway.  If it is more 
generally useful then please share numbers to that effect in the commit 
message and also explicitly state in the comments why we're trying to 
exert this level of control on codegen in generic C code and why it is 
good for all architectures.

> +	  h = h * 32 + c0;
> +	  return h;
> +	}
> +
> +      /* Ideal instruction scheduling is:
> +	 c1 += c0;
> +	 h *= 33 * 33;
> +	 c0 *= 32;
> +	 c1 += c0;
> +	 h  += c1;
> +
> +	 The asm statements are to prevent reassosiation that would result in
> +	 more instruction interdependencies and worse scheduling.  */
> +      c1 += c0;
> +      asm("" : "+r"(c1), "+r"(c0));
> +      h *= 33 * 33;
> +      c1 += c0 * 32;
> +      asm("" : "+r"(c1));
> +      h += c1;
> +      s += 2;
> +    }
>   }
>   
>   

Thanks,
Siddhesh

  reply	other threads:[~2022-05-16 14:12 UTC|newest]

Thread overview: 167+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14  4:12 [PATCH v1 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-04-14  4:12 ` [PATCH v1 2/6] elf: Add tests for the hash functions in dl-hash.h Noah Goldstein
2022-04-14  4:12 ` [PATCH v1 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-14  4:12 ` [PATCH v1 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-14  4:12 ` [PATCH v1 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-14  4:12 ` [PATCH v1 6/6] elf: Optimize __dl_new_hash in dl-hash.h Noah Goldstein
2022-04-14  4:32 ` [PATCH v1 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked H.J. Lu
2022-04-14 14:56   ` Noah Goldstein
2022-04-14 14:55 ` [PATCH v2 " Noah Goldstein
2022-04-14 14:55   ` [PATCH v2 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-25 15:39     ` Florian Weimer
2022-04-25 15:59       ` Noah Goldstein
2022-04-14 14:55   ` [PATCH v2 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-25 15:38     ` Florian Weimer
2022-04-25 15:58       ` Noah Goldstein
2022-04-26  8:35         ` Florian Weimer
2022-04-26 21:39           ` Noah Goldstein
2022-04-27 10:48             ` Florian Weimer
2022-04-27 15:02               ` Noah Goldstein
2022-04-14 14:55   ` [PATCH v2 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-14 14:55   ` [PATCH v2 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-14 14:55   ` [PATCH v2 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-04-25 15:58 ` [PATCH v3 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-04-25 15:58   ` [PATCH v3 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-25 15:58   ` [PATCH v3 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-25 15:58   ` [PATCH v3 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-25 15:58   ` [PATCH v3 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-25 15:58   ` [PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-04-25 16:01   ` [PATCH v3 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Adhemerval Zanella
2022-04-25 16:18     ` Noah Goldstein
2022-04-25 15:59 ` [PATCH v1 " Adhemerval Zanella
2022-04-25 16:16   ` Noah Goldstein
2022-04-25 16:35 ` [PATCH v3 " Noah Goldstein
2022-04-25 16:35   ` [PATCH v3 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-25 16:35   ` [PATCH v3 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-27 10:39     ` Florian Weimer
2022-04-27 16:24       ` Noah Goldstein
2022-04-25 16:35   ` [PATCH v3 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-25 16:36   ` [PATCH v3 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-27 10:47     ` Florian Weimer
2022-04-25 16:36   ` [PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-04-27 10:43     ` Florian Weimer
2022-04-27 16:25       ` Noah Goldstein
2022-04-27 15:02     ` Alexander Monakov
     [not found]       ` <CAFUsyfKeocq4VAusvnggq-NR=tOQTjrD0Z6r3CYCTjGQ=tGGSw@mail.gmail.com>
     [not found]         ` <f54f1ec9-fc31-283f-bce9-59fd8bda98ad@ispras.ru>
2022-04-27 16:23           ` Noah Goldstein
2022-04-28 18:03             ` Alexander Monakov
2022-05-04 18:04               ` Alexander Monakov
2022-05-05 11:07                 ` Alexander Monakov
2022-05-05 15:10                   ` Noah Goldstein
2022-05-05 15:26                     ` Alexander Monakov
2022-05-05 18:03                       ` Noah Goldstein
2022-05-05 19:37                         ` Alexander Monakov
2022-05-05 22:51                           ` Noah Goldstein
2022-04-27 16:19 ` [PATCH v4 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-04-27 16:19   ` [PATCH v4 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-04-27 16:19   ` [PATCH v4 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-04-27 16:20   ` [PATCH v4 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-04-27 16:20   ` [PATCH v4 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-04-27 16:20   ` [PATCH v4 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-09 17:17 ` [PATCH v5 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-09 17:17   ` [PATCH v5 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-09 17:17   ` [PATCH v5 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-09 17:17   ` [PATCH v5 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-09 17:17   ` [PATCH v5 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-09 17:17   ` [PATCH v5 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-10 11:58     ` Adhemerval Zanella
2022-05-10 15:04 ` [PATCH v6 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-10 15:04   ` [PATCH v6 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-10 15:04   ` [PATCH v6 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-10 15:04   ` [PATCH v6 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-10 15:04   ` [PATCH v6 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-10 15:04   ` [PATCH v6 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-10 15:29     ` H.J. Lu
2022-05-10 15:31       ` H.J. Lu
2022-05-10 16:49     ` Alexander Monakov
2022-05-10 17:17       ` Noah Goldstein
2022-05-10 17:40         ` Alexander Monakov
2022-05-10 23:30 ` [PATCH v7 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-10 23:30   ` [PATCH v7 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-10 23:30   ` [PATCH v7 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-10 23:30   ` [PATCH v7 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-10 23:30   ` [PATCH v7 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-10 23:30   ` [PATCH v7 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-10 23:46     ` H.J. Lu
2022-05-11  3:07       ` Noah Goldstein
2022-05-11  3:06 ` [PATCH v8 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein
2022-05-11  3:06   ` [PATCH v8 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-11  3:06   ` [PATCH v8 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-11  3:06   ` [PATCH v8 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-11  3:06   ` [PATCH v8 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-11  3:06   ` [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-16 14:12     ` Siddhesh Poyarekar [this message]
2022-05-16 14:31       ` Alexander Monakov
2022-05-16 16:23         ` Siddhesh Poyarekar
2022-05-16 16:38           ` Noah Goldstein
2022-05-16 16:44             ` Siddhesh Poyarekar
2022-05-16 20:32               ` Noah Goldstein
2022-05-16 18:09       ` Alexander Monakov
2022-05-16 18:47         ` Siddhesh Poyarekar
2022-05-16 19:28           ` Alexander Monakov
2022-05-16 19:35             ` Noah Goldstein
2022-05-16 19:41               ` Alexander Monakov
2022-05-16 19:47                 ` Adhemerval Zanella
2022-05-16 20:00                   ` Alexander Monakov
2022-05-16 20:08                     ` Adhemerval Zanella
2022-05-16 20:27                       ` Alexander Monakov
2022-05-16 19:48                 ` Noah Goldstein
2022-05-16 20:33                   ` Alexander Monakov
2022-05-16 21:40                     ` Noah Goldstein
2022-05-17  1:45             ` Siddhesh Poyarekar
2022-05-16 13:56   ` [PATCH v8 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Siddhesh Poyarekar
2022-05-16 20:31     ` Noah Goldstein
2022-05-16 20:29 ` [PATCH v9 " Noah Goldstein
2022-05-16 20:30   ` [PATCH v9 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-17  4:19     ` Siddhesh Poyarekar
2022-05-18 17:29       ` Noah Goldstein
2022-05-16 20:30   ` [PATCH v9 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-17  4:32     ` Siddhesh Poyarekar
2022-05-18 17:30       ` Noah Goldstein
2022-05-16 20:30   ` [PATCH v9 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-17  4:52     ` Siddhesh Poyarekar
2022-05-18 17:33       ` Noah Goldstein
2022-05-16 20:30   ` [PATCH v9 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-17  5:11     ` Siddhesh Poyarekar
2022-05-18 17:34       ` Noah Goldstein
2022-05-18 17:35         ` Noah Goldstein
2022-05-16 20:30   ` [PATCH v9 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-17  5:12     ` Siddhesh Poyarekar
2022-05-18 17:38       ` Noah Goldstein
2022-05-19 15:59         ` Siddhesh Poyarekar
2022-05-19 16:54           ` DJ Delorie
2022-05-17  3:34   ` [PATCH v9 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Siddhesh Poyarekar
2022-05-18 17:28     ` Noah Goldstein
2022-05-18 17:26 ` [PATCH v10 " Noah Goldstein
2022-05-18 17:26   ` [PATCH v10 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-19 14:49     ` Siddhesh Poyarekar
2022-05-18 17:26   ` [PATCH v10 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-19 15:09     ` Siddhesh Poyarekar
2022-05-19 15:40       ` Siddhesh Poyarekar
2022-05-19 22:20         ` Noah Goldstein
2022-05-18 17:26   ` [PATCH v10 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-19 15:34     ` Siddhesh Poyarekar
2022-05-19 22:20       ` Noah Goldstein
2022-05-18 17:26   ` [PATCH v10 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-19 15:41     ` Siddhesh Poyarekar
2022-05-19 22:21       ` Noah Goldstein
2022-05-18 17:26   ` [PATCH v10 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-18 17:32     ` H.J. Lu
2022-05-18 17:39       ` Noah Goldstein
2022-05-19  7:53       ` Siddhesh Poyarekar
2022-05-19 15:55     ` Siddhesh Poyarekar
2022-05-19 22:22       ` Noah Goldstein
2022-05-19 14:47   ` [PATCH v10 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Siddhesh Poyarekar
2022-05-19 14:50     ` Noah Goldstein
2022-05-19 14:56       ` Siddhesh Poyarekar
2022-05-19 22:17 ` [PATCH v11 " Noah Goldstein
2022-05-19 22:17   ` [PATCH v11 2/6] elf: Add tests for the dl hash funcs (_dl_new_hash and _dl_elf_hash) Noah Goldstein
2022-05-19 22:19     ` Noah Goldstein
2022-05-19 22:18   ` [PATCH v11 3/6] nss: Add tests for the nss_hash in nss_hash.h Noah Goldstein
2022-05-23  7:42     ` Siddhesh Poyarekar
2022-05-19 22:18   ` [PATCH v11 4/6] benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hash Noah Goldstein
2022-05-23  7:44     ` Siddhesh Poyarekar
2022-05-19 22:18   ` [PATCH v11 5/6] nss: Optimize nss_hash in nss_hash.c Noah Goldstein
2022-05-23  7:44     ` Siddhesh Poyarekar
2022-05-19 22:18   ` [PATCH v11 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Noah Goldstein
2022-05-23  7:46     ` Siddhesh Poyarekar
2022-05-19 22:18   ` [PATCH v11 1/6] elf: Refactor dl_new_hash so it can be tested / benchmarked Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org \
    --to=siddhesh@gotplt.org \
    --cc=amonakov@ispras.ru \
    --cc=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).