From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cross.elm.relay.mailchannels.net (cross.elm.relay.mailchannels.net [23.83.212.46]) by sourceware.org (Postfix) with ESMTPS id E6963385734C for ; Mon, 16 May 2022 16:23:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E6963385734C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id A838C4C0F53; Mon, 16 May 2022 16:23:23 +0000 (UTC) Received: from pdx1-sub0-mail-a306.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 13ADB4C2105; Mon, 16 May 2022 16:23:23 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652718203; a=rsa-sha256; cv=none; b=NCXSlVSwXZVcqCg4xKcky29xNV9hoPy1bNuqK5N1POJVmmDqlTCgfDx7pFhigSSsgKip7R 3WDSNV101lkEjMBXvYbKzq20cuGiN2XJ7crhdw8eW/J5+O4AA6Z6yj1sq3D9pees1kTK8c rMYb8UAiTXP9empmhB+UIQCfymLvwz9TiZHuoKhHZO1m3mh22Tj/GcrBjGr3t8Rc1kIVf+ goYv8lV0K4ZcNrKZ6LqHNeHAF8xYeVFmGVuTeBg3p+Q17XWqD2s/qSysbEt8pQoy/VwDFH MuDWINpw0jtSR8l+5LcaPwcNxfInpZN6UMfDifrn158iEUxlW12UFyClimX34A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652718203; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+cVWZ7K09a3T99uh61z8HbUjwBw9QGOD2E0kADWr2kc=; b=mDrhLF5HHAiCCJ3TPXbD9imlrtu/bIsa6Y4LzLNmF02S6q6Op3TBZq53HiMhhl/jse+gh+ a+byEtGLwRENS6KwEnPE6z2PWOBumC98KW11gYlNPLHEY6htNwUrhakwcQFNQfY/85Ql5J NbNTf1k5gb5oxhRnEhMnAA6QTg42Wyyil1iC2qaMDzlHNX9/c7nLsM590KxmFJVns7gTdA a6jnzpLRY4c3W9kdnE1Oh9y3LOGwNWzdPFY9O5CNq/JGHQ8rrtlAkGIKrgliNGI5LjIzU/ Wudi/RqHLgprsdlxaWjwxYh/vk81yZyQcoqKXIneWgpQweIr4tNgfBMmXLjSaw== ARC-Authentication-Results: i=1; rspamd-6fcfc4d76-jv4hs; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Chief-Tart: 0b18dfc110095efa_1652718203510_2334286864 X-MC-Loop-Signature: 1652718203510:2330232487 X-MC-Ingress-Time: 1652718203510 Received: from pdx1-sub0-mail-a306.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.124.238.93 (trex/6.7.1); Mon, 16 May 2022 16:23:23 +0000 Received: from [192.168.1.174] (unknown [1.186.223.88]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a306.dreamhost.com (Postfix) with ESMTPSA id 4L24LK2f1kz1P7; Mon, 16 May 2022 09:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org; s=dreamhost; t=1652718202; bh=+cVWZ7K09a3T99uh61z8HbUjwBw9QGOD2E0kADWr2kc=; h=Date:Subject:To:Cc:From:Content-Type:Content-Transfer-Encoding; b=ZDu5wXPS53H55vBMEJPqushXNOd6dSLtb1+ZJzIGav/ISpKE/uKj0JUM6Nvm5CPG3 WIE4UjfS4tUpQ0XTJKr6L6gNed8UweBv7/0EBpifGXs9H9FmVAFuW5eZ6M9LRpPWCl p+p+aFCKc+NnoNInogfaONEq1iGv28qjSCQF4NuY6oLAidyx/dtzadhVt+isHZbONX 42raitbxM8daPurtyTFolzj537nSVZvZd95zjojpA3fmvQhbyl/WoJpAolQK4CrgiY NJA/YGFstCQnpfK218/n8SOdmUSThHsZNKttJSicxi7xmNytjcgZU05Cv2IM4284Qq HD/u+vgHvTlKQ== Message-ID: <77e77937-98b1-b4e6-dd29-cadd261cef44@gotplt.org> Date: Mon, 16 May 2022 21:53:15 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Content-Language: en-US To: Alexander Monakov Cc: Noah Goldstein , libc-alpha@sourceware.org References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> From: Siddhesh Poyarekar In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3029.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 16:23:27 -0000 On 16/05/2022 20:01, Alexander Monakov wrote: > On Mon, 16 May 2022, Siddhesh Poyarekar wrote: > >> There are a couple of things that seem problematic to me about this: >> >> - It seems like we're trying to fix a gcc issue in glibc. Couldn't we file a >> gcc bug and explore ways in which this could be supported in the compiler? In >> fact, it might make sense to do that for the original loop; it looks like a >> missed optimization that gcc ought to fix. IMO the bug should be filed even >> if we do end up with this micro-optimization in glibc. > > This issue involves a chain of dependencies that goes across all loop > iterations, but relevant compiler optimization (reassociation, register > allocation, scheduling) do not consider such global chains. You might > file this as a "wishlist" bug, but compiler infrastructure is simply > not designed to make such nontrivial decisions. Thanks for the context, this should go into comments. A wishlist bug would be nice but I suspect it'll just gather dust. Maybe it's still useful for someone coming in after 10-15 years looking for more context on it. >> - The patch controls an instruction schedule so that it works well on >> out-of-order processors but then only quoting one microarchitecture. > > It's not specific to out-of-order processors: a long chain of dependencies > restricts OoO scheduling in the CPU. So in the end it benefits "classic" > and OoO pipelines in a similar fashion. > >> If it >> works well on TigerLake (and on x86 in general) then it might be better to add >> it as a sysdep override; I assumed that was the point of breaking the function >> out into its header anyway. If it is more generally useful then please share >> numbers to that effect in the commit message and also explicitly state in the >> comments why we're trying to exert this level of control on codegen in generic >> C code and why it is good for all architectures. > > I guess it's up to you and Noah to hash it out, but I'd like to remind that > there was an alternative variant which is a strict win on all architectures > (same code size, same instruction mix, no dependency on fast multiplication). > That might be easier to justify from generic code point of view. I would prefer the earlier variant in generic code, with (if necessary) the scheduling hack being a sysdep for x86. Other architectures that want to use the latter should #include it and also post microbenchmark results so that we keep track of how we arrived at that decision. Thanks, Siddhesh