From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 0E346385B804 for ; Mon, 16 May 2022 19:41:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E346385B804 Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id ECA6040755C3; Mon, 16 May 2022 19:41:13 +0000 (UTC) Date: Mon, 16 May 2022 22:41:13 +0300 (MSK) From: Alexander Monakov To: Noah Goldstein cc: Siddhesh Poyarekar , GNU C Library Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h In-Reply-To: Message-ID: References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 19:41:16 -0000 On Mon, 16 May 2022, Noah Goldstein wrote: > > The empty asms are used to prevent compiler reassociating 'h*32 + (h + c)' > > to '(h*32 + h) + c' which looks fine in isolation, but significantly changes > > the dependency graph in context of the whole loop. > > Some architecture could have a really fast integer MADD instruction that > the barrier could either prevent from being emitted or add an extra ADD > instruction at the end of. With the barrier I'd expect a shift-by-5 and two additions, no madd. Modern aarch64 cores have 3-cycle madd I believe, so it's 3 cycles if the compiler decides to emit madd vs. 2 cycles if it's only additions. Alexander