From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id 88C573856254 for ; Wed, 18 May 2022 17:32:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 88C573856254 Received: by mail-pj1-x102c.google.com with SMTP id a23-20020a17090acb9700b001df4e9f4870so2769594pju.1 for ; Wed, 18 May 2022 10:32:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yAoUKtSs1hjqDZ/JZnI0MG+ThhH8SGL/xIHwlE7++RU=; b=yDuzu74QHH1rNyp5ny8TvZ6z172fF2U2TgdFp+YzgZuFv6ZBqp9538fdMApqaSqAX8 xemHvjruYzmGtLUA2f3ygYD5x4notwa8dD/PzlWpvedVx2Jv/dv0XWSDOGX9pQmBlcuu 5v4HkUXIc654lyrWf4oxht8w1I40VRex7yzs5he0U138VL5WEXTysvb+CN6gaOP3snB3 coNqOhzkHCpIxwu2Mb/JJdGSsNVNwJ40gQkTzVp3KoA9AyLrXSONlk3c6xnzesNqxFjw ruBZo9xoDMdIbIqtDZEpqpWTRg+RPlsuj8qvt3K7j9q3++p7xW7+ozR5q/fL2BI4qFvI dFeQ== X-Gm-Message-State: AOAM5323zESXkkElEXpEVbiVKCgEU6SOK5eP3IG/k8NiJ0UofNT2MuAw JFdY9Mfquj1sDcv+xOjZuMnNKoWaSY9lDXrCDwBDhj+gJTk= X-Google-Smtp-Source: ABdhPJz6xDuSrlzOlhgfYOOJXWYXgEhPkjpl1Eulzvxu0lzVLFwWyOc7gBnSesKje13xWxEOpWbutZK1FnI4TeAV7K4= X-Received: by 2002:a17:902:ccc1:b0:15a:24df:a7cc with SMTP id z1-20020a170902ccc100b0015a24dfa7ccmr670042ple.42.1652895165404; Wed, 18 May 2022 10:32:45 -0700 (PDT) MIME-Version: 1.0 References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220518172635.291119-1-goldstein.w.n@gmail.com> <20220518172635.291119-6-goldstein.w.n@gmail.com> In-Reply-To: <20220518172635.291119-6-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Wed, 18 May 2022 10:32:09 -0700 Message-ID: Subject: Re: [PATCH v10 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" , Alexander Monakov Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3025.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2022 17:32:49 -0000 andOn Wed, May 18, 2022 at 10:26 AM Noah Goldstein wrote: > > Unroll slightly and enforce good instruction scheduling. This improves > performance on out-of-order machines. The unrolling allows for > pipelined multiplies. > > As well, as an optional sysdep, reorder the operations and prevent > reassosiation for better scheduling and higher ILP. This commit > only adds the barrier for x86, although it should be either no > change or a win for any architecture. > > Unrolling further started to induce slowdowns for sizes [0, 4] > but can help the loop so if larger sizes are the target further > unrolling can be beneficial. > > Results for _dl_new_hash > Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz > > Time as Geometric Mean of N=30 runs > Geometric of all benchmark New / Old: 0.674 > type, length, New Time, Old Time, New Time / Old Time > fixed, 0, 2.865, 2.72, 1.053 > fixed, 1, 3.567, 2.489, 1.433 > fixed, 2, 2.577, 3.649, 0.706 > fixed, 3, 3.644, 5.983, 0.609 > fixed, 4, 4.211, 6.833, 0.616 > fixed, 5, 4.741, 9.372, 0.506 > fixed, 6, 5.415, 9.561, 0.566 > fixed, 7, 6.649, 10.789, 0.616 > fixed, 8, 8.081, 11.808, 0.684 > fixed, 9, 8.427, 12.935, 0.651 > fixed, 10, 8.673, 14.134, 0.614 > fixed, 11, 10.69, 15.408, 0.694 > fixed, 12, 10.789, 16.982, 0.635 > fixed, 13, 12.169, 18.411, 0.661 > fixed, 14, 12.659, 19.914, 0.636 > fixed, 15, 13.526, 21.541, 0.628 > fixed, 16, 14.211, 23.088, 0.616 > fixed, 32, 29.412, 52.722, 0.558 > fixed, 64, 65.41, 142.351, 0.459 > fixed, 128, 138.505, 295.625, 0.469 > fixed, 256, 291.707, 601.983, 0.485 > random, 2, 12.698, 12.849, 0.988 > random, 4, 16.065, 15.857, 1.013 > random, 8, 19.564, 21.105, 0.927 > random, 16, 23.919, 26.823, 0.892 > random, 32, 31.987, 39.591, 0.808 > random, 64, 49.282, 71.487, 0.689 > random, 128, 82.23, 145.364, 0.566 > random, 256, 152.209, 298.434, 0.51 > > Co-authored-by: Alexander Monakov > --- > benchtests/bench-dl-new-hash.c | 3 +- > elf/{dl-new-hash.h => simple-dl-new-hash.h} | 20 ++-- > elf/tst-dl-hash.c | 1 + > sysdeps/generic/dl-new-hash.h | 111 ++++++++++++++++++++ > sysdeps/x86/dl-new-hash.h | 24 +++++ > 5 files changed, 146 insertions(+), 13 deletions(-) > rename elf/{dl-new-hash.h => simple-dl-new-hash.h} (75%) > create mode 100644 sysdeps/generic/dl-new-hash.h > create mode 100644 sysdeps/x86/dl-new-hash.h > > diff --git a/benchtests/bench-dl-new-hash.c b/benchtests/bench-dl-new-hash.c > index 3c8a1d5a82..040fa7ce01 100644 > --- a/benchtests/bench-dl-new-hash.c > +++ b/benchtests/bench-dl-new-hash.c > @@ -16,7 +16,8 @@ > License along with the GNU C Library; if not, see > . */ > > -#include > +#include > +#include > #define TEST_FUNC(x, y) _dl_new_hash (x) > #define SIMPLE_TEST_FUNC(x, y) __simple_dl_new_hash (x) > > diff --git a/elf/dl-new-hash.h b/elf/simple-dl-new-hash.h > similarity index 75% > rename from elf/dl-new-hash.h > rename to elf/simple-dl-new-hash.h > index 8641bb4196..1437b1bd36 100644 > --- a/elf/dl-new-hash.h > +++ b/elf/simple-dl-new-hash.h > @@ -1,4 +1,4 @@ > -/* _dl_new_hash for elf symbol lookup > +/* __simple_dl_new_hash for testing true elf symbol lookup. > Copyright (C) 2022 Free Software Foundation, Inc. > This file is part of the GNU C Library. > > @@ -16,16 +16,16 @@ > License along with the GNU C Library; if not, see > . */ > > -#ifndef _DL_NEW_HASH_H > -#define _DL_NEW_HASH_H 1 > +#ifndef _SIMPLE_DL_NEW_HASH_H > +#define _SIMPLE_DL_NEW_HASH_H 1 > > #include > -/* For __always_inline. */ > -#include > > -static __always_inline uint32_t > +/* For testing/benchmarking purposes. Real implementation in > + sysdeps/generic/dl-new-hash.h. */ > +static uint32_t > __attribute__ ((unused)) > -_dl_new_hash (const char *s) > +__simple_dl_new_hash (const char *s) > { > uint32_t h = 5381; > for (unsigned char c = *s; c != '\0'; c = *++s) > @@ -33,8 +33,4 @@ _dl_new_hash (const char *s) > return h; > } > > -/* For testing/benchmarking purposes. */ > -#define __simple_dl_new_hash _dl_new_hash > - > - > -#endif /* dl-new-hash.h */ > +#endif /* simple-dl-new-hash.h */ > diff --git a/elf/tst-dl-hash.c b/elf/tst-dl-hash.c > index 8697eb73a0..b21766c63d 100644 > --- a/elf/tst-dl-hash.c > +++ b/elf/tst-dl-hash.c > @@ -18,6 +18,7 @@ > > > #include > +#include > #include > #include > #include > diff --git a/sysdeps/generic/dl-new-hash.h b/sysdeps/generic/dl-new-hash.h > new file mode 100644 > index 0000000000..1faf309c97 > --- /dev/null > +++ b/sysdeps/generic/dl-new-hash.h > @@ -0,0 +1,111 @@ > +/* _dl_new_hash for elf symbol lookup > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _DL_NEW_HASH_H > +#define _DL_NEW_HASH_H 1 > + > +#include > +/* For __always_inline. */ > +#include > +/* For __glibc_unlikely. */ > +#include > + > +/* The simplest implementation of _dl_new_hash is: > + > + _dl_new_hash (const char *s) > + { > + uint32_t h = 5381; > + for (unsigned char c = *s; c != '\0'; c = *++s) > + h = h * 33 + c; > + return h; > + } > + > + We can get better performance by slightly unrolling the loop to > + pipeline the multiples, which gcc cannot easily do due to > + dependencies across iterations. > + > + As well, as an architecture specific option we add asm statements > + to explicitly specify order of operations and prevent reassociation > + of instructions that lengthens the loop carried dependency. This > + may have no affect as the compiler may have ordered instructions > + the same way without it but in testing this has not been the case > + for GCC. Improving GCC to reliably schedule instructions ideally > + cannot be easily done. > + > + Architecture(s) that use the reassociation barries are: > + x86 > + > + Note it is very unlikely the reassociation barriers would > + de-optimize performance on any architecture and with an imperfect > + compiler it may help performance, especially on out-of-order cpus, > + so it is suggested that the respective maintainers add them. > + > + architecture maintainers are encouraged to benchmark this with > + __asm_reassociation_barrier defined to __asm__ like it is in x86. > +*/ > + > + > +#ifndef __asm_reassociation_barrier > +# define __asm_reassociation_barrier(...) > +#endif > + > +static __always_inline uint32_t > +__attribute__ ((unused)) > +_dl_new_hash (const char *str) > +{ > + const unsigned char *s = (const unsigned char *) str; > + unsigned int h = 5381; > + unsigned int c0, c1; > + for (;;) > + { > + c0 = s[0]; > + /* Since hashed string is normally not empty, this is unlikely on the > + first iteration of the loop. */ > + if (__glibc_unlikely (c0 == 0)) > + return h; > + > + c1 = s[1]; > + if (c1 == 0) > + { > + /* Ideal computational order is: > + c0 += h; > + h *= 32; > + h += c0; */ > + c0 += h; > + __asm_reassociation_barrier("" : "+r"(h) : "r"(c0)); > + h = h * 32 + c0; > + return h; > + } > + > + /* Ideal computational order is: > + c1 += c0; > + h *= 33 * 33; > + c0 *= 32; > + c1 += c0; > + h += c1; */ > + c1 += c0; > + __asm_reassociation_barrier("" : "+r"(c1), "+r"(c0)); > + h *= 33 * 33; > + c1 += c0 * 32; > + __asm_reassociation_barrier("" : "+r"(c1)); > + h += c1; > + s += 2; > + } > +} > + > +#endif /* dl-new-hash.h */ > diff --git a/sysdeps/x86/dl-new-hash.h b/sysdeps/x86/dl-new-hash.h > new file mode 100644 > index 0000000000..ce8fb5a838 > --- /dev/null > +++ b/sysdeps/x86/dl-new-hash.h > @@ -0,0 +1,24 @@ > +/* _dl_new_hash for elf symbol lookup > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifdef __asm_reassociation_barrier > +# error "__asm_reassociation_barrier should never already be defined." > +#endif > + > +#define __asm_reassociation_barrier __asm__ > +#include > -- > 2.34.1 > Should the new _dl_new_hash be placed in sysdeps/x86/dl-new-hash.h and leave the generic one unchanged? -- H.J.