From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 435E03858D3C for ; Mon, 16 May 2022 19:28:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 435E03858D3C Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id 15B3740755C3; Mon, 16 May 2022 19:28:04 +0000 (UTC) Date: Mon, 16 May 2022 22:28:04 +0300 (MSK) From: Alexander Monakov To: Siddhesh Poyarekar cc: Noah Goldstein , libc-alpha@sourceware.org Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h In-Reply-To: <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> Message-ID: <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 19:28:10 -0000 On Tue, 17 May 2022, Siddhesh Poyarekar wrote: > On 16/05/2022 23:39, Alexander Monakov wrote: > > On Mon, 16 May 2022, Siddhesh Poyarekar wrote: > > > >> There are a couple of things that seem problematic to me about this: > > [snip] > > > > Since Carlos mentioned in today's patch review message that you didn't like > > something about the asms, can you (or Adhemerval) please explain what you > > meant? The asms in the patch have empty body and standard "r" constraints, > > I'd say they are perfectly portable. > > I did explain; I am not comfortable with controlling instruction scheduling in > that manner for generic code because it assumes more about underlying > processor pipelines and instruction sequences than we typically do in generic > code. It has nothing to do with portability. Adhemerval raised the question > about whether this ought to be done in gcc instead, which I concurred with > too. Thank you very much for the detailed response. Allow me to clear up what seems to be a technical misunderstanding here: this is not about instruction scheduling, but rather dependencies in the computations (I know Noah mentioned scheduling, but it's confusing especially in context of benchmarking for an out-of-order CPU). I have shown how different variants have different chains of dependencies in this email: https://sourceware.org/pipermail/libc-alpha/2022-May/138495.html The empty asms are used to prevent compiler reassociating 'h*32 + (h + c)' to '(h*32 + h) + c' which looks fine in isolation, but significantly changes the dependency graph in context of the whole loop. There's nothing specific to the x86 architecture in this reasoning. On arm and aarch64 it's moot because they evaluate 'h*32 + h' in a single cycle, though. Alexander