From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amonakov@ispras.ru>
Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84])
 by sourceware.org (Postfix) with ESMTPS id C9364383D821
 for <libc-alpha@sourceware.org>; Mon, 16 May 2022 14:31:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C9364383D821
Received: from [10.10.3.121] (unknown [10.10.3.121])
 by mail.ispras.ru (Postfix) with ESMTPS id 9F37840D403E;
 Mon, 16 May 2022 14:31:49 +0000 (UTC)
Date: Mon, 16 May 2022 17:31:49 +0300 (MSK)
From: Alexander Monakov <amonakov@ispras.ru>
To: Siddhesh Poyarekar <siddhesh@gotplt.org>
cc: Noah Goldstein <goldstein.w.n@gmail.com>, libc-alpha@sourceware.org
Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h
In-Reply-To: <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org>
Message-ID: <c3b8d5be-ace9-365d-8717-38dd27e5b6f@ispras.ru>
References: <20220414041231.926415-1-goldstein.w.n@gmail.com>
 <20220511030635.154689-1-goldstein.w.n@gmail.com>
 <20220511030635.154689-6-goldstein.w.n@gmail.com>
 <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2022 14:32:00 -0000

On Mon, 16 May 2022, Siddhesh Poyarekar wrote:

> There are a couple of things that seem problematic to me about this:
> 
> - It seems like we're trying to fix a gcc issue in glibc.  Couldn't we file a
> gcc bug and explore ways in which this could be supported in the compiler?  In
> fact, it might make sense to do that for the original loop; it looks like a
> missed optimization that gcc ought to fix.  IMO the bug should be filed even
> if we do end up with this micro-optimization in glibc.

This issue involves a chain of dependencies that goes across all loop
iterations, but relevant compiler optimization (reassociation, register
allocation, scheduling) do not consider such global chains. You might
file this as a "wishlist" bug, but compiler infrastructure is simply
not designed to make such nontrivial decisions.

> - The patch controls an instruction schedule so that it works well on
> out-of-order processors but then only quoting one microarchitecture.

It's not specific to out-of-order processors: a long chain of dependencies
restricts OoO scheduling in the CPU. So in the end it benefits "classic"
and OoO pipelines in a similar fashion.

> If it
> works well on TigerLake (and on x86 in general) then it might be better to add
> it as a sysdep override; I assumed that was the point of breaking the function
> out into its header anyway.  If it is more generally useful then please share
> numbers to that effect in the commit message and also explicitly state in the
> comments why we're trying to exert this level of control on codegen in generic
> C code and why it is good for all architectures.

I guess it's up to you and Noah to hash it out, but I'd like to remind that
there was an alternative variant which is a strict win on all architectures
(same code size, same instruction mix, no dependency on fast multiplication).
That might be easier to justify from generic code point of view.

Alexander