From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id C9364383D821 for ; Mon, 16 May 2022 14:31:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C9364383D821 Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id 9F37840D403E; Mon, 16 May 2022 14:31:49 +0000 (UTC) Date: Mon, 16 May 2022 17:31:49 +0300 (MSK) From: Alexander Monakov To: Siddhesh Poyarekar cc: Noah Goldstein , libc-alpha@sourceware.org Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h In-Reply-To: <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> Message-ID: References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 14:32:00 -0000 On Mon, 16 May 2022, Siddhesh Poyarekar wrote: > There are a couple of things that seem problematic to me about this: > > - It seems like we're trying to fix a gcc issue in glibc. Couldn't we file a > gcc bug and explore ways in which this could be supported in the compiler? In > fact, it might make sense to do that for the original loop; it looks like a > missed optimization that gcc ought to fix. IMO the bug should be filed even > if we do end up with this micro-optimization in glibc. This issue involves a chain of dependencies that goes across all loop iterations, but relevant compiler optimization (reassociation, register allocation, scheduling) do not consider such global chains. You might file this as a "wishlist" bug, but compiler infrastructure is simply not designed to make such nontrivial decisions. > - The patch controls an instruction schedule so that it works well on > out-of-order processors but then only quoting one microarchitecture. It's not specific to out-of-order processors: a long chain of dependencies restricts OoO scheduling in the CPU. So in the end it benefits "classic" and OoO pipelines in a similar fashion. > If it > works well on TigerLake (and on x86 in general) then it might be better to add > it as a sysdep override; I assumed that was the point of breaking the function > out into its header anyway. If it is more generally useful then please share > numbers to that effect in the commit message and also explicitly state in the > comments why we're trying to exert this level of control on codegen in generic > C code and why it is good for all architectures. I guess it's up to you and Noah to hash it out, but I'd like to remind that there was an alternative variant which is a strict win on all architectures (same code size, same instruction mix, no dependency on fast multiplication). That might be easier to justify from generic code point of view. Alexander