From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from donkey.elm.relay.mailchannels.net (donkey.elm.relay.mailchannels.net [23.83.212.49]) by sourceware.org (Postfix) with ESMTPS id 10AF2385735F for ; Tue, 17 May 2022 05:12:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 10AF2385735F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 9FF166A19D4; Tue, 17 May 2022 05:12:57 +0000 (UTC) Received: from pdx1-sub0-mail-a304.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id AFF636A1489; Tue, 17 May 2022 05:12:56 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652764376; a=rsa-sha256; cv=none; b=V2VAcnrnEYbi2jvkg48a6mjzs+BFAoyyBteoPgd+Dh6GRsuQUZglo8N2XMEfZaZuHYzd6x bMyknqrYDKEojVioiq3ZkU3kw2uXj0+xCZ4OHrGOAa822t53X9SsSQpu2g3ftG/Ur4vLtK 6mNYSNRMnTHPBOB3ukupzJh7VP78xlRmnbIvGkcegSfT8ufqmVMol1GAlH1pSLt+AxVzv/ sCdw0DkMV7/oPGQD+o5Q5PgnbGdQo5nUdvZ8A/pz4QPu+rQtQ3p17Vm+iKDRCaZMGwTWh8 ByhCD+AbqM8mYVR0Nzp4nxGxz5fTCiuEq84yjbGN1oqAZ7gz9n5M6z+03jVG1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652764376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0iHR+VKjxha2jI/gFbJsB5iO7PTDEGFKl/7Q+pQAG3I=; b=aMUTeOQ+zkns9LzZdg7SlS8PclBeb70Z4230YDaSw8xSPu691WgNk9ZnAlEHETbTbjKDX2 v09fy89pt+OZnHjSKtiQDK2s0BHE9sFaxgzUjR60IrWlaPIS0uj29uhq2/kHpRFt4JsUJO IWESvAr8mu4m9FqKYbQG3EXZQmv6CTrzVVKJAjhRQVNrjH/niltj2/jXM/nzv7JRG/inK7 jyJE+G/jAivHN4ugKg2q9+NpZs7Fz8r9oTQANpPzNxguMsnVdK07WV7ei7c+Vn8OqrWsnO pFOI/rLwmLU7+BPG1ZCS7EYw3fXeW04q8ronaYsvfe8p9q616bDYnHqpyIknYQ== ARC-Authentication-Results: i=1; rspamd-554c8f6c56-qg22c; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Keen-Tangy: 68f6ab5729177f58_1652764377400_2058412248 X-MC-Loop-Signature: 1652764377400:1633831956 X-MC-Ingress-Time: 1652764377400 Received: from pdx1-sub0-mail-a304.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.116.106.119 (trex/6.7.1); Tue, 17 May 2022 05:12:57 +0000 Received: from [192.168.1.174] (unknown [1.186.223.88]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a304.dreamhost.com (Postfix) with ESMTPSA id 4L2PQG6Pzkz1P7; Mon, 16 May 2022 22:12:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org; s=dreamhost; t=1652764376; bh=YGsI6y3Arlc+eVE9UnNiJ5Fs4yTCtXVIL43jbXKMNYA=; h=Date:Subject:To:Cc:From:Content-Type:Content-Transfer-Encoding; b=movsUaFnmH9vy3EO+xIBNSTZN0osfzYo61XSGUWGsE6dB9ZEXB5lTSHfMK0jC6QFq AFyuaENY3cuhZ+MhgKhXH2P8omqE+P8lqvte/B1YKR2qWzwElZ5IGzpwi5J+NK74fG 3taFwLcA7LOfxiPHK3imWKwlL0LGLmp8Y1P7iSWitqoqR1NZIO04JvmhRsz9MtibBl Y55D2t0TRPpToLMh9piRaFsUVe1g7cyU4wOnxS29mLEchcIqU42n1G1+6dBKrCz6Sm hsM2uyg+vCxD5VlPxf+CVzrpGFwbQAMB3UMxPav69vVIDBGi5Cp06TejoQrMTYOc2d KHn/O6FQ0Bdfg== Message-ID: <0020ab3f-748a-f02b-da06-2b793694e168@gotplt.org> Date: Tue, 17 May 2022 10:42:49 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v9 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Content-Language: en-US To: Noah Goldstein , libc-alpha@sourceware.org Cc: Alexander Monakov References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220516203004.38687-1-goldstein.w.n@gmail.com> <20220516203004.38687-6-goldstein.w.n@gmail.com> From: Siddhesh Poyarekar In-Reply-To: <20220516203004.38687-6-goldstein.w.n@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3037.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2022 05:13:01 -0000 Not sure why, but the series failed to apply on trybot. It's probably a trybot bug because it applied just fine on my up to date copy. On 17/05/2022 02:00, Noah Goldstein via Libc-alpha wrote: > Unroll slightly and enforce good instruction scheduling. This improves > performance on out-of-order machines. The unrolling allows for > pipelined multiplies. > > As well, as an optional sysdep, reorder the operations and prevent > reassosiation for better scheduling and higher ILP. This commit reassociation. Later in the patch too. > only adds the barrier for x86, although it should be either no > change or a win for any architecture. > > Unrolling further started to induce slowdowns for sizes [0, 4] > but can help the loop so if larger sizes are the target further > unrolling can be beneficial. > > Results for _dl_new_hash > Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz > > Time as Geometric Mean of N=30 runs > Geometric of all benchmark New / Old: 0.674 > type, length, New Time, Old Time, New Time / Old Time > fixed, 0, 2.865, 2.72, 1.053 > fixed, 1, 3.567, 2.489, 1.433 > fixed, 2, 2.577, 3.649, 0.706 > fixed, 3, 3.644, 5.983, 0.609 > fixed, 4, 4.211, 6.833, 0.616 > fixed, 5, 4.741, 9.372, 0.506 > fixed, 6, 5.415, 9.561, 0.566 > fixed, 7, 6.649, 10.789, 0.616 > fixed, 8, 8.081, 11.808, 0.684 > fixed, 9, 8.427, 12.935, 0.651 > fixed, 10, 8.673, 14.134, 0.614 > fixed, 11, 10.69, 15.408, 0.694 > fixed, 12, 10.789, 16.982, 0.635 > fixed, 13, 12.169, 18.411, 0.661 > fixed, 14, 12.659, 19.914, 0.636 > fixed, 15, 13.526, 21.541, 0.628 > fixed, 16, 14.211, 23.088, 0.616 > fixed, 32, 29.412, 52.722, 0.558 > fixed, 64, 65.41, 142.351, 0.459 > fixed, 128, 138.505, 295.625, 0.469 > fixed, 256, 291.707, 601.983, 0.485 > random, 2, 12.698, 12.849, 0.988 > random, 4, 16.065, 15.857, 1.013 > random, 8, 19.564, 21.105, 0.927 > random, 16, 23.919, 26.823, 0.892 > random, 32, 31.987, 39.591, 0.808 > random, 64, 49.282, 71.487, 0.689 > random, 128, 82.23, 145.364, 0.566 > random, 256, 152.209, 298.434, 0.51 > > Co-authored-by: Alexander Monakov > --- > sysdeps/generic/dl-new-hash.h | 114 +++++++++++++++++++++++++++++ > {elf => sysdeps/x86}/dl-new-hash.h | 16 +--- This breaks the benchmark build, but including just dl-new-hash.h in the benchmark should fix it. > 2 files changed, 117 insertions(+), 13 deletions(-) > create mode 100644 sysdeps/generic/dl-new-hash.h > rename {elf => sysdeps/x86}/dl-new-hash.h (77%) > > diff --git a/sysdeps/generic/dl-new-hash.h b/sysdeps/generic/dl-new-hash.h > new file mode 100644 > index 0000000000..84aa7991a4 > --- /dev/null > +++ b/sysdeps/generic/dl-new-hash.h > @@ -0,0 +1,114 @@ > +/* _dl_new_hash for elf symbol lookup > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _DL_NEW_HASH_H > +#define _DL_NEW_HASH_H 1 > + > +#include > +/* For __always_inline. */ > +#include > +/* For __glibc_unlikely. */ > +#include Same header included twice. > + > +/* The simplest implementation of _dl_new_hash is: > + > + _dl_new_hash (const char *s) > + { > + uint32_t h = 5381; > + for (unsigned char c = *s; c != '\0'; c = *++s) > + h = h * 33 + c; > + return h; > + } > + > + We can get better performance by slightly unrolling the > + loop to pipeline the multiples. .. the multiples, which gcc cannot easily do due to dependencies across iterations. > + As well, as an architecture specific option we add asm statements > + to explicitly specifying order of operations to prevent to explicitly specify the order... > + reassosiation of instructions that lengthens the loop carried > + dependency. This may have no affect as the compiler may have > + ordered instructions the same way without it but in testing this > + has not been the case for GCC. Improving GCC to reliably schedule > + instructions ideally cannot be easily done. > + > + Architecture(s) that use the reassosiation barries are: > + x86 > + > + Note it is very unlikely the reassosiation barriers would > + de-optimize performance on any archictecture and with an imperfect architecture > + compiler it may help performance, especially on out-of-order cpus, > + so it is suggested that the respective maintainers add them. */ Suggest: "architecture maintainers are encouraged to benchmark this with __asm_reassociation_barrier defined to __asm__ like it is in x86." > + > + > +#ifndef __asm_reassociation_barrier > +# define __asm_reassociation_barrier(...) > +#endif > + > +static __always_inline uint32_t > +__attribute__ ((unused)) > +_dl_new_hash (const char *str) > +{ > + const unsigned char *s = (const unsigned char *) str; > + unsigned int h = 5381; > + unsigned int c0, c1; > + for (;;) > + { > + c0 = s[0]; > + /* Since hashed string is normally not empty, this is unlikely on the > + first iteration of the loop. */ > + if (__glibc_unlikely (c0 == 0)) > + return h; > + > + c1 = s[1]; > + if (c1 == 0) > + { > + /* Ideal instruction scheduling is: Suggest: "Ideal computation order is" > + c0 += h; > + h *= 32; > + h += c0; > + > + The __asm_reassociation_barrier() macro is a sysdep optional asm > + statements to prevents reassosiation that would result in more > + instruction interdependencies and worse scheduling. */ This bit is redundant with the description at the top near __asm_reassociation_barrier. > + c0 += h; > + __asm_reassociation_barrier("" : "+r"(h) : "r"(c0)); > + h = h * 32 + c0; > + return h; > + } > + > + /* Ideal instruction scheduling is: Same: "Ideal computation order is" > + c1 += c0; > + h *= 33 * 33; > + c0 *= 32; > + c1 += c0; > + h += c1; > + > + The __asm_reassociation_barrier() macro is a sysdep optional asm > + statements to prevents reassosiation that would result in more > + instruction interdependencies and worse scheduling. */ This too is redundant. > + c1 += c0; > + __asm_reassociation_barrier("" : "+r"(c1), "+r"(c0)); > + h *= 33 * 33; > + c1 += c0 * 32; > + __asm_reassociation_barrier("" : "+r"(c1)); > + h += c1; > + s += 2; > + } > +} > + > +#endif /* dl-new-hash.h */ > diff --git a/elf/dl-new-hash.h b/sysdeps/x86/dl-new-hash.h > similarity index 77% > rename from elf/dl-new-hash.h > rename to sysdeps/x86/dl-new-hash.h > index b7a91ecc07..dd800265bf 100644 > --- a/elf/dl-new-hash.h > +++ b/sysdeps/x86/dl-new-hash.h > @@ -19,19 +19,9 @@ > #ifndef _DL_NEW_HASH_H > #define _DL_NEW_HASH_H 1 No need to define this... > > -#include > -/* For __always_inline. */ > -#include > - > -static __always_inline uint32_t > -__attribute__ ((unused)) > -_dl_new_hash (const char *s) > -{ > - uint32_t h = 5381; > - for (unsigned char c = *s; c != '\0'; c = *++s) > - h = h * 33 + c; > - return h; > -} > +#define __asm_reassociation_barrier __asm__ > > +#undef _DL_NEW_HASH_H ... if it is unconditionally undefined here. > +#include > > #endif /* dl-new-hash.h */