From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from hamster.birch.relay.mailchannels.net (hamster.birch.relay.mailchannels.net [23.83.209.80]) by sourceware.org (Postfix) with ESMTPS id BD4DF385700D for ; Tue, 17 May 2022 05:11:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BD4DF385700D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 449E46A1828; Tue, 17 May 2022 05:11:27 +0000 (UTC) Received: from pdx1-sub0-mail-a304.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 9E6526A1553; Tue, 17 May 2022 05:11:26 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652764286; a=rsa-sha256; cv=none; b=c6JHPAdqvU5SJNl8WTEFPm532OFdQpz9KIB7E4U83SIOF6fMCYiyKhEkDD699Eyfpv1j7P v+sF2m1SaEHCmg/RW9bGEErnj3F+G6BtXf2ij25gzbIA1YtO/Q34LK8VskI3Gs/aMuvZyp DxDeqc86A8nT1IxXGTrde2+s04aBJKyFHD8ZfcyQKhpewn8vVxLK0Z6PBJgZKPS5l/1Sr0 xCIsRVooj93DZ/Z2xt2YIffvx2bwpBCsF6d0DQP4+0F64sOb8h4/r2IoCjCyHU64rNmj+g 6ql1WQtt6bWeHX4YEQuStYjO3koosvWrPAV5QCDCZJhchubvptULEzYTuTwCzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652764286; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0nlQSFpuYlo8lDneojNDphWFE64YUsN/qUi8jxM6o38=; b=EYh9ZWjIXbIAbr/AUEoqu8inXqMA1qcY2sE3BkpVxrvCeS9V24kSKoefVTumQ9YqqeoaRL G+WPxd5P/8ahTIl7QAWyQzQubIfNRIfJiZYMSyLUWUrxTtSnAYOF9sz8n8IAxBq51q/XIT opM9fWvApi7RB+di0YtNdxrkrrKk4WzMBsZ5Jnm0U5rSqVwzrFwZxmRwJjnz438Qqa01Bx ZTwR4V6kDegicKIhRwAXB1UfQPdvEuqqmnv9Obr2J9i2/dpoTg9V5Y6bmT/8mWyjy/wrlQ nEX+UQpJkSaWSpUOUCPiZxFoLeKW2LzexIpY5WxCP+sm5FsvRwGIL8MlKTMNfw== ARC-Authentication-Results: i=1; rspamd-554c8f6c56-jjm8j; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Sponge-Cooperative: 492851da5a3e2ba3_1652764286974_4188797012 X-MC-Loop-Signature: 1652764286974:4190591196 X-MC-Ingress-Time: 1652764286973 Received: from pdx1-sub0-mail-a304.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.106.158.163 (trex/6.7.1); Tue, 17 May 2022 05:11:26 +0000 Received: from [192.168.1.174] (unknown [1.186.223.88]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a304.dreamhost.com (Postfix) with ESMTPSA id 4L2PNY35Zbz1K6; Mon, 16 May 2022 22:11:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org; s=dreamhost; t=1652764286; bh=0nlQSFpuYlo8lDneojNDphWFE64YUsN/qUi8jxM6o38=; h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding; b=TB4Omtls1JGm3y4smYeE0ePdISYuwRj2b6p2M5+MX439jEOHsHhzzq+rgx0IR/34C FltmeS4QRFIx1fX4bXjuSjjr3JAcWppoZa5kWooMDqY0qy+xq31q3M3bIYDNpKvzKs alt820kkf01eV6SRfsRlTlAH1OJwou3AfBSdDessFBJ73C0sf8eMdLgKeIRj9JzV1G jagT286A70hSgCxU8IIeyRs1ujAP3ajZVl7mEJ8/ap/XSKZOXP4oWIiTf0FS6Ox9JT 7HScvsOfSpFr9wXKfIkL7FJWZqk2T39Tv0KsxKRQZUadi17xuIJp+Bn0wkVpjSiIee fjI3/Vv0IXn9Q== Message-ID: <9d7b9a9a-37ba-6478-e1f2-578bb0540115@gotplt.org> Date: Tue, 17 May 2022 10:41:20 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v9 5/6] nss: Optimize nss_hash in nss_hash.c Content-Language: en-US To: Noah Goldstein , libc-alpha@sourceware.org References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220516203004.38687-1-goldstein.w.n@gmail.com> <20220516203004.38687-5-goldstein.w.n@gmail.com> From: Siddhesh Poyarekar In-Reply-To: <20220516203004.38687-5-goldstein.w.n@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3036.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, LIKELY_SPAM_BODY, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2022 05:11:30 -0000 On 17/05/2022 02:00, Noah Goldstein via Libc-alpha wrote: > The prior unrolling didn't really do much as it left the dependency > chain between iterations. Unrolled the loop for 4 so 4x multiplies > could be pipelined in out-of-order machines. > > Results for __nss_hash > Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz > > Time as Geometric Mean of N=25 runs > Geometric of all benchmark New / Old: 0.845 > type, length, New Time, Old Time, New Time / Old Time > fixed, 0, 4.019, 3.729, 1.078 > fixed, 1, 4.95, 5.707, 0.867 > fixed, 2, 5.152, 5.657, 0.911 > fixed, 3, 4.641, 5.721, 0.811 > fixed, 4, 5.551, 5.81, 0.955 > fixed, 5, 6.525, 6.552, 0.996 > fixed, 6, 6.711, 6.561, 1.023 > fixed, 7, 6.715, 6.767, 0.992 > fixed, 8, 7.874, 7.915, 0.995 > fixed, 9, 8.888, 9.767, 0.91 > fixed, 10, 8.959, 9.762, 0.918 > fixed, 11, 9.188, 9.987, 0.92 > fixed, 12, 9.708, 10.618, 0.914 > fixed, 13, 10.393, 11.14, 0.933 > fixed, 14, 10.628, 12.097, 0.879 > fixed, 15, 10.982, 12.965, 0.847 > fixed, 16, 11.851, 14.429, 0.821 > fixed, 32, 24.334, 34.414, 0.707 > fixed, 64, 55.618, 86.688, 0.642 > fixed, 128, 118.261, 224.36, 0.527 > fixed, 256, 256.183, 538.629, 0.476 > random, 2, 11.194, 11.556, 0.969 > random, 4, 17.516, 17.205, 1.018 > random, 8, 23.501, 20.985, 1.12 > random, 16, 28.131, 29.212, 0.963 > random, 32, 35.436, 38.662, 0.917 > random, 64, 45.74, 58.868, 0.777 > random, 128, 75.394, 121.963, 0.618 > random, 256, 139.524, 260.726, 0.535 > --- > nss/nss_hash.c | 79 +++++++++++++++++++++++++++----------------------- > 1 file changed, 42 insertions(+), 37 deletions(-) > > diff --git a/nss/nss_hash.c b/nss/nss_hash.c > index 27a348ea9b..c6a375f386 100644 > --- a/nss/nss_hash.c > +++ b/nss/nss_hash.c > @@ -19,58 +19,63 @@ > > /* This is from libc/db/hash/hash_func.c, hash3 is static there */ > /* > - * This is INCREDIBLY ugly, but fast. We break the string up into 8 byte > + * This is INCREDIBLY ugly, but fast. We break the string up into 4 byte > * units. On the first time through the loop we get the "leftover bytes" > - * (strlen % 8). On every other iteration, we perform 8 HASHC's so we handle > - * all 8 bytes. Essentially, this saves us 7 cmp & branch instructions. If > - * this routine is heavily used enough, it's worth the ugly coding. > + * (len % 4). On every other iteration, we perform a 4x unrolled version > + * HASHC. Further unrolling does not appear to help. > * > * OZ's original sdbm hash > */ > uint32_t > __nss_hash (const void *keyarg, size_t len) > { > + enum > + { > + HASH_CONST_P0 = 1, /* (uint32_t)(65599 ^ 0). */ > + HASH_CONST_P1 = 65599, /* (uint32_t)(65599 ^ 1). */ > + HASH_CONST_P2 = 8261505, /* (uint32_t)(65599 ^ 2). */ > + HASH_CONST_P3 = 780587199, /* (uint32_t)(65599 ^ 3). */ > + HASH_CONST_P4 = 1139564289 /* (uint32_t)(65599 ^ 4). */ > + }; > + > const unsigned char *key; > - size_t loop; > uint32_t h; > > -#define HASHC h = *key++ + 65599 * h > +#define HASHC h = *key++ + HASH_CONST_P1 * h > > h = 0; > key = keyarg; > if (len > 0) > { > - loop = (len + 8 - 1) >> 3; > - switch (len & (8 - 1)) > - { > - case 0: > - do > - { > - HASHC; > - /* FALLTHROUGH */ > - case 7: > - HASHC; > - /* FALLTHROUGH */ > - case 6: > - HASHC; > - /* FALLTHROUGH */ > - case 5: > - HASHC; > - /* FALLTHROUGH */ > - case 4: > - HASHC; > - /* FALLTHROUGH */ > - case 3: > - HASHC; > - /* FALLTHROUGH */ > - case 2: > - HASHC; > - /* FALLTHROUGH */ > - case 1: > - HASHC; > - } > - while (--loop); > - } > + switch ((len & (4 - 1))) > + { > + case 0: > + /* h starts out as zero so no need to include the multiply. */ > + h = *key++; > + /* FALLTHROUGH */ > + case 3: > + HASHC; > + /* FALLTHROUGH */ > + case 2: > + HASHC; > + /* FALLTHROUGH */ > + case 1: > + HASHC; > + /* FALLTHROUGH */ > + } The first 4 bytes, also sufficient for len <= 4. OK. > + > + uint32_t c0, c1, c2, c3; > + for (--len; len >= 4; len -= 4) > + { > + c0 = (unsigned char) *(key + 0); > + c1 = (unsigned char) *(key + 1); > + c2 = (unsigned char) *(key + 2); > + c3 = (unsigned char) *(key + 3); > + h = HASH_CONST_P4 * h + HASH_CONST_P3 * c0 + HASH_CONST_P2 * c1 > + + HASH_CONST_P1 * c2 + HASH_CONST_P0 * c3; > + > + key += 4; > + } Remaining larger lengths. OK. > } > return h; > } TBH this wins solely on the front of the code being easier to understand. The fact that it is also faster in some cases is a bonus :) LGTM. Reviewed-by: Siddhesh Poyarekar