From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from buffalo.birch.relay.mailchannels.net (buffalo.birch.relay.mailchannels.net [23.83.209.24]) by sourceware.org (Postfix) with ESMTPS id A2B8838515F3 for ; Thu, 19 May 2022 07:53:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A2B8838515F3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 318F8560BDB; Thu, 19 May 2022 07:53:24 +0000 (UTC) Received: from pdx1-sub0-mail-a304.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id E235D560750; Thu, 19 May 2022 07:53:22 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652946803; a=rsa-sha256; cv=none; b=TStDrLwnBmzEd/WBZh0ncfATgUsUmH+NThIZU+uzVZCdVMr6lowePugSvPng2E6UimBTXQ Xj9nbII+LXr9hO0KM0R3q8lfY/zsjTjj+ZeEp7Gy/EzggY6TQj8mJbgE84rtwjTTZkcK57 hoPa9NQoqLXLrl8q8KLCaVi7ygQKa4eF8f0a7zI8pLQb6yZeCj5VIUrTECuDwf8ODofTl5 7dNjrlWl1QuJvEWzS5WtzSR9Vg9LFtkf5+Pc7WfX5ajbmQNMw0EK7v4/bW+3mPL8wNkw7K oXBBsy8xGNdG2JW6di0dViJsz/dCBj+mAcTEJIYwh6Q5IiNfLPRp+v2Zhd4izQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652946803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I9OiTclAQLuX2kV0QeBUM2v3kLrBhKcP+UFA8A9sOv8=; b=X/leBMBDbJoPwM8uR31Buylx6R0OiIs8uRqLlJRcJ/7Y1dh10928Ppu/vbl5fyVBdxD669 mktN19c1CRmBd77tYMqf2gXCMOmkLsiXgTb6fhPRuowKif65VbsUXBGDgldBizcvm5v1Md y0dOTcg+YPlfQJSBGr8r9XG1vdL3q/V9V3GaTxSau2pL3C5L068YqPQoJzO3xbvNMzJl2O LQyeEduX15KPpub7C73IkpOiwNVT51a3vaq6q97191A5F5VeHjMEghUjaqIYcXBNy5Gk6g 0qCsZ85Wv6Ff8MlToeThRq1tSz8ta7bscCpqilX6mYCCp7WJ0ddoOxJcfpHf7g== ARC-Authentication-Results: i=1; rspamd-554c8f6c56-b2z7l; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Gusty-Shrill: 29e9366879c32113_1652946804013_381288834 X-MC-Loop-Signature: 1652946804013:4111186256 X-MC-Ingress-Time: 1652946804013 Received: from pdx1-sub0-mail-a304.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.106.158.163 (trex/6.7.1); Thu, 19 May 2022 07:53:24 +0000 Received: from [100.96.149.178] (unknown [49.248.235.144]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a304.dreamhost.com (Postfix) with ESMTPSA id 4L3htS75mxz1NC; Thu, 19 May 2022 00:53:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org; s=dreamhost; t=1652946802; bh=I9OiTclAQLuX2kV0QeBUM2v3kLrBhKcP+UFA8A9sOv8=; h=Date:Subject:To:Cc:From:Content-Type:Content-Transfer-Encoding; b=UMS7tG8d65RKs8GLRC3hVamjK6r96IK9nPZwI3iGNQXjll1AaNxVWUp2uf3jiKjOG F4/DraHuOZa04yFabTa1xnnNcpSavqz0JTEAh58HuCkNST+Ijoyb+FVBqErQIpdsW2 8vu2ciCHLwg6NwU0+RLf/jDm79V5dBqKU/jTSW/d6y0snrEfXISSGAkoKnsg1dRpX7 ukGZoWLOsAYozeDjQmii/83xwbsw1tUnAuMEeW+DZRnfbwl79qNCOme7iB9w9ME+lK IJEbvK9rlE/2wJ2QwhYlfNdUorYA6nO9rEsnwcJsgzvDQNbbOfPH8OKTHJWr4Iyx/v cYpU/XlZZgJDQ== Message-ID: <64bb8f37-d09d-2ae6-6b50-ef64eecb3787@gotplt.org> Date: Thu, 19 May 2022 13:23:17 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v10 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Content-Language: en-US To: "H.J. Lu" , Noah Goldstein Cc: Alexander Monakov , GNU C Library References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220518172635.291119-1-goldstein.w.n@gmail.com> <20220518172635.291119-6-goldstein.w.n@gmail.com> From: Siddhesh Poyarekar In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3038.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 May 2022 07:53:28 -0000 On 18/05/2022 23:02, H.J. Lu via Libc-alpha wrote: > andOn Wed, May 18, 2022 at 10:26 AM Noah Goldstein > wrote: >> >> Unroll slightly and enforce good instruction scheduling. This improves >> performance on out-of-order machines. The unrolling allows for >> pipelined multiplies. >> >> As well, as an optional sysdep, reorder the operations and prevent >> reassosiation for better scheduling and higher ILP. This commit >> only adds the barrier for x86, although it should be either no >> change or a win for any architecture. >> >> Unrolling further started to induce slowdowns for sizes [0, 4] >> but can help the loop so if larger sizes are the target further >> unrolling can be beneficial. >> >> Results for _dl_new_hash >> Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz >> >> Time as Geometric Mean of N=30 runs >> Geometric of all benchmark New / Old: 0.674 >> type, length, New Time, Old Time, New Time / Old Time >> fixed, 0, 2.865, 2.72, 1.053 >> fixed, 1, 3.567, 2.489, 1.433 >> fixed, 2, 2.577, 3.649, 0.706 >> fixed, 3, 3.644, 5.983, 0.609 >> fixed, 4, 4.211, 6.833, 0.616 >> fixed, 5, 4.741, 9.372, 0.506 >> fixed, 6, 5.415, 9.561, 0.566 >> fixed, 7, 6.649, 10.789, 0.616 >> fixed, 8, 8.081, 11.808, 0.684 >> fixed, 9, 8.427, 12.935, 0.651 >> fixed, 10, 8.673, 14.134, 0.614 >> fixed, 11, 10.69, 15.408, 0.694 >> fixed, 12, 10.789, 16.982, 0.635 >> fixed, 13, 12.169, 18.411, 0.661 >> fixed, 14, 12.659, 19.914, 0.636 >> fixed, 15, 13.526, 21.541, 0.628 >> fixed, 16, 14.211, 23.088, 0.616 >> fixed, 32, 29.412, 52.722, 0.558 >> fixed, 64, 65.41, 142.351, 0.459 >> fixed, 128, 138.505, 295.625, 0.469 >> fixed, 256, 291.707, 601.983, 0.485 >> random, 2, 12.698, 12.849, 0.988 >> random, 4, 16.065, 15.857, 1.013 >> random, 8, 19.564, 21.105, 0.927 >> random, 16, 23.919, 26.823, 0.892 >> random, 32, 31.987, 39.591, 0.808 >> random, 64, 49.282, 71.487, 0.689 >> random, 128, 82.23, 145.364, 0.566 >> random, 256, 152.209, 298.434, 0.51 >> >> Co-authored-by: Alexander Monakov >> --- >> benchtests/bench-dl-new-hash.c | 3 +- >> elf/{dl-new-hash.h => simple-dl-new-hash.h} | 20 ++-- >> elf/tst-dl-hash.c | 1 + >> sysdeps/generic/dl-new-hash.h | 111 ++++++++++++++++++++ >> sysdeps/x86/dl-new-hash.h | 24 +++++ >> 5 files changed, 146 insertions(+), 13 deletions(-) >> rename elf/{dl-new-hash.h => simple-dl-new-hash.h} (75%) >> create mode 100644 sysdeps/generic/dl-new-hash.h >> create mode 100644 sysdeps/x86/dl-new-hash.h >> >> diff --git a/benchtests/bench-dl-new-hash.c b/benchtests/bench-dl-new-hash.c >> index 3c8a1d5a82..040fa7ce01 100644 >> --- a/benchtests/bench-dl-new-hash.c >> +++ b/benchtests/bench-dl-new-hash.c >> @@ -16,7 +16,8 @@ >> License along with the GNU C Library; if not, see >> . */ >> >> -#include >> +#include >> +#include >> #define TEST_FUNC(x, y) _dl_new_hash (x) >> #define SIMPLE_TEST_FUNC(x, y) __simple_dl_new_hash (x) >> >> diff --git a/elf/dl-new-hash.h b/elf/simple-dl-new-hash.h >> similarity index 75% >> rename from elf/dl-new-hash.h >> rename to elf/simple-dl-new-hash.h >> index 8641bb4196..1437b1bd36 100644 >> --- a/elf/dl-new-hash.h >> +++ b/elf/simple-dl-new-hash.h >> @@ -1,4 +1,4 @@ >> -/* _dl_new_hash for elf symbol lookup >> +/* __simple_dl_new_hash for testing true elf symbol lookup. >> Copyright (C) 2022 Free Software Foundation, Inc. >> This file is part of the GNU C Library. >> >> @@ -16,16 +16,16 @@ >> License along with the GNU C Library; if not, see >> . */ >> >> -#ifndef _DL_NEW_HASH_H >> -#define _DL_NEW_HASH_H 1 >> +#ifndef _SIMPLE_DL_NEW_HASH_H >> +#define _SIMPLE_DL_NEW_HASH_H 1 >> >> #include >> -/* For __always_inline. */ >> -#include >> >> -static __always_inline uint32_t >> +/* For testing/benchmarking purposes. Real implementation in >> + sysdeps/generic/dl-new-hash.h. */ >> +static uint32_t >> __attribute__ ((unused)) >> -_dl_new_hash (const char *s) >> +__simple_dl_new_hash (const char *s) >> { >> uint32_t h = 5381; >> for (unsigned char c = *s; c != '\0'; c = *++s) >> @@ -33,8 +33,4 @@ _dl_new_hash (const char *s) >> return h; >> } >> >> -/* For testing/benchmarking purposes. */ >> -#define __simple_dl_new_hash _dl_new_hash >> - >> - >> -#endif /* dl-new-hash.h */ >> +#endif /* simple-dl-new-hash.h */ >> diff --git a/elf/tst-dl-hash.c b/elf/tst-dl-hash.c >> index 8697eb73a0..b21766c63d 100644 >> --- a/elf/tst-dl-hash.c >> +++ b/elf/tst-dl-hash.c >> @@ -18,6 +18,7 @@ >> >> >> #include >> +#include >> #include >> #include >> #include >> diff --git a/sysdeps/generic/dl-new-hash.h b/sysdeps/generic/dl-new-hash.h >> new file mode 100644 >> index 0000000000..1faf309c97 >> --- /dev/null >> +++ b/sysdeps/generic/dl-new-hash.h >> @@ -0,0 +1,111 @@ >> +/* _dl_new_hash for elf symbol lookup >> + Copyright (C) 2022 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#ifndef _DL_NEW_HASH_H >> +#define _DL_NEW_HASH_H 1 >> + >> +#include >> +/* For __always_inline. */ >> +#include >> +/* For __glibc_unlikely. */ >> +#include >> + >> +/* The simplest implementation of _dl_new_hash is: >> + >> + _dl_new_hash (const char *s) >> + { >> + uint32_t h = 5381; >> + for (unsigned char c = *s; c != '\0'; c = *++s) >> + h = h * 33 + c; >> + return h; >> + } >> + >> + We can get better performance by slightly unrolling the loop to >> + pipeline the multiples, which gcc cannot easily do due to >> + dependencies across iterations. >> + >> + As well, as an architecture specific option we add asm statements >> + to explicitly specify order of operations and prevent reassociation >> + of instructions that lengthens the loop carried dependency. This >> + may have no affect as the compiler may have ordered instructions >> + the same way without it but in testing this has not been the case >> + for GCC. Improving GCC to reliably schedule instructions ideally >> + cannot be easily done. >> + >> + Architecture(s) that use the reassociation barries are: >> + x86 >> + >> + Note it is very unlikely the reassociation barriers would >> + de-optimize performance on any architecture and with an imperfect >> + compiler it may help performance, especially on out-of-order cpus, >> + so it is suggested that the respective maintainers add them. >> + >> + architecture maintainers are encouraged to benchmark this with >> + __asm_reassociation_barrier defined to __asm__ like it is in x86. >> +*/ >> + >> + >> +#ifndef __asm_reassociation_barrier >> +# define __asm_reassociation_barrier(...) >> +#endif >> + >> +static __always_inline uint32_t >> +__attribute__ ((unused)) >> +_dl_new_hash (const char *str) >> +{ >> + const unsigned char *s = (const unsigned char *) str; >> + unsigned int h = 5381; >> + unsigned int c0, c1; >> + for (;;) >> + { >> + c0 = s[0]; >> + /* Since hashed string is normally not empty, this is unlikely on the >> + first iteration of the loop. */ >> + if (__glibc_unlikely (c0 == 0)) >> + return h; >> + >> + c1 = s[1]; >> + if (c1 == 0) >> + { >> + /* Ideal computational order is: >> + c0 += h; >> + h *= 32; >> + h += c0; */ >> + c0 += h; >> + __asm_reassociation_barrier("" : "+r"(h) : "r"(c0)); >> + h = h * 32 + c0; >> + return h; >> + } >> + >> + /* Ideal computational order is: >> + c1 += c0; >> + h *= 33 * 33; >> + c0 *= 32; >> + c1 += c0; >> + h += c1; */ >> + c1 += c0; >> + __asm_reassociation_barrier("" : "+r"(c1), "+r"(c0)); >> + h *= 33 * 33; >> + c1 += c0 * 32; >> + __asm_reassociation_barrier("" : "+r"(c1)); >> + h += c1; >> + s += 2; >> + } >> +} >> + >> +#endif /* dl-new-hash.h */ >> diff --git a/sysdeps/x86/dl-new-hash.h b/sysdeps/x86/dl-new-hash.h >> new file mode 100644 >> index 0000000000..ce8fb5a838 >> --- /dev/null >> +++ b/sysdeps/x86/dl-new-hash.h >> @@ -0,0 +1,24 @@ >> +/* _dl_new_hash for elf symbol lookup >> + Copyright (C) 2022 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#ifdef __asm_reassociation_barrier >> +# error "__asm_reassociation_barrier should never already be defined." >> +#endif >> + >> +#define __asm_reassociation_barrier __asm__ >> +#include >> -- >> 2.34.1 >> > > Should the new _dl_new_hash be placed in sysdeps/x86/dl-new-hash.h > and leave the generic one unchanged? > There are 3 implementations: the reference one in elf/dl-new-hash.h that's retained for verification, the optimized one in sysdeps-generic/dl-new-hash.h that is suitable for all architectures and the micro-optimized one with optimized schedule for x86, giving it that little bit more. Siddhesh