From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 14CA53850431 for ; Mon, 16 May 2022 20:27:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 14CA53850431 Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id C321440D403E; Mon, 16 May 2022 20:27:43 +0000 (UTC) Date: Mon, 16 May 2022 23:27:43 +0300 (MSK) From: Alexander Monakov To: Adhemerval Zanella cc: GNU C Library Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h In-Reply-To: Message-ID: <83124e93-26ce-9e1b-c8e0-668f835f6771@ispras.ru> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> <65cda871-9f96-4792-d0d9-923573ec5abd@linaro.org> <7a2c4ab-fb44-54ec-7780-8134101480a@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 20:27:46 -0000 On Mon, 16 May 2022, Adhemerval Zanella via Libc-alpha wrote: > >> How hard would to make compiler to make this very optimization? I raised > >> this on weekly call because more and more it seems that tuning computation > >> dependencies for loop tuning seems to be more a compiler job than libc's > >> (although this not a blocker, but we have multiple smalls micro-optimizations > >> in the past that turned in dead code due compiler catching up). > > > > Sorry, since you're responding to a discussion about multiply-add, it's unclear > > to me which optimization you mean. Is your question about choosing which > > sequence of additions has shorter cross-iteration chain? > > Indeed I was not clear, I mean the reply to [1] where you explain why > you have suggested the asm to prevent compiler reassociating. > > [1] https://sourceware.org/pipermail/libc-alpha/2022-May/138794.html I think it's pretty hard, you'd have to decompose 'h*33' into '(h<<5)+h' in the reassociation pass, notice that it's a part of addition chain that feeds the phi node for 'h', and based on that select a specific association variant (all to shave off one cycle per iteration). To me it looks like an optimization just for this exact scenario. And then you need to "hope" that no other pass undoes this transformation. It would be quite some nontrivial code in the compiler, when the alternative is getting a guaranteed outcome for any compiler by adding an empty asm statement in a loop that iterates thousands of times on every process startup. Alexander