From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eastern.ash.relay.mailchannels.net (eastern.ash.relay.mailchannels.net [23.83.222.55]) by sourceware.org (Postfix) with ESMTPS id 59C853858C55 for ; Tue, 17 May 2022 01:46:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 59C853858C55 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 0C4758210B4; Tue, 17 May 2022 01:46:03 +0000 (UTC) Received: from pdx1-sub0-mail-a304.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 4234B8211F5; Tue, 17 May 2022 01:46:02 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1652751962; a=rsa-sha256; cv=none; b=1CD9zR5GH79WuG1JTZ4Y69IozIhAqBbeMGcwp4qoY9GlbxNOHYAXt+/jWAw3G799tTXQyC 3wSoelKr3KdMw5yyhlv4GavDPahInrXMDTAu++Or3i1YB6Nnv6KnKlV91KB64JOyGm+dUT o1qa6lDn8zaJP8q2IXF2JIgGREtdLqihFerx/9gl9OFXPlh8hrGvH5+c9tmLYeNnIrOTuE om068md8ABhgAOXHqCDit9CoBPDv9IVyGlsj6oqr6ClDZkP+vtRvLZ/zdejj3eve7bOuUa H4+Fa4BVU2+TGfoMYwXjTWen1dfRIWgbF08FunhmP9/k0IUbAgxiMB4GLia26w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1652751962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WwcmCb/WRpBzwnbqb3YRxvKA/y4l/XO6IR6jb5Mf8iA=; b=+Yk/M164vrwyhveePQzgTnc9+bE66JhvfnHxTgfWqjjM8qdVNNI8Ryj/m3KkDZyuoNuqAe OKzxiZCAto+bI/RIz5V53v114759o965Z941SdgQ/NE3M7OVu46+Rxb+YEx+bhkoN5i/uw 9q1kFKteIixop3DL4fS936Xst/Pon20q2Reg69a8ai8o2v2TWjjrXxHFmbyg76YToL9K5M 5muFsnYGIvn+QUSZoB+ItCRZujG6rj0FqSE4vGHlRknj3swAOmCgCSUEakKdK4XkzJYrtG R/PFkulJQ8s3rSUVs2AdAhS+RDkgVBFGYjMLavbe8h7clkXzGuXBq81nDVC70g== ARC-Authentication-Results: i=1; rspamd-554c8f6c56-7rdxj; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Cellar-Army: 31e10cd866880f49_1652751962811_560462025 X-MC-Loop-Signature: 1652751962811:3763138130 X-MC-Ingress-Time: 1652751962811 Received: from pdx1-sub0-mail-a304.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.98.242.203 (trex/6.7.1); Tue, 17 May 2022 01:46:02 +0000 Received: from [192.168.1.174] (unknown [1.186.223.88]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a304.dreamhost.com (Postfix) with ESMTPSA id 4L2JqX4VBTz35; Mon, 16 May 2022 18:46:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gotplt.org; s=dreamhost; t=1652751961; bh=WwcmCb/WRpBzwnbqb3YRxvKA/y4l/XO6IR6jb5Mf8iA=; h=Date:Subject:To:Cc:From:Content-Type:Content-Transfer-Encoding; b=SAEHigAMU33TPqiYm/4JQPRKTuwJ8CrEWSzzcHMA9ty61ll6rK/VxrmxnGp70C5+d etN5nf60qzoqJUXz7KoMSNVvnEjBfgBtVa0hgQXBPRfsM72+lRNNoXqY61+1gw4AQg usA0/cDhA+vdItm6DoT/pPk2iSVnxqApV4BY3MQqmK3XpVvDnb8t1tBRA9c8XAgyPk PqGP4rlc/VOvHLyRwjQdNI54mj1oGO7R5HoxjKM9ILTW/dNBAOveDXxMONY9FwTLW6 XzyRWQoiLG9A+3ef09S3riLZ5lRZEXZoWMX7t1e6SDcg9pS5894zhIIrBtrJWKCu5D kC6hCxmG0i3+A== Message-ID: <7cc432ef-4c50-9172-fe1d-d96a1f2c8d7b@gotplt.org> Date: Tue, 17 May 2022 07:15:55 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Content-Language: en-US To: Alexander Monakov Cc: Noah Goldstein , libc-alpha@sourceware.org References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> From: Siddhesh Poyarekar In-Reply-To: <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3029.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2022 01:46:10 -0000 On 17/05/2022 00:58, Alexander Monakov wrote: >> I did explain; I am not comfortable with controlling instruction scheduling in >> that manner for generic code because it assumes more about underlying >> processor pipelines and instruction sequences than we typically do in generic >> code. It has nothing to do with portability. Adhemerval raised the question >> about whether this ought to be done in gcc instead, which I concurred with >> too. > > Thank you very much for the detailed response. Allow me to clear up what seems > to be a technical misunderstanding here: this is not about instruction > scheduling, but rather dependencies in the computations (I know Noah mentioned > scheduling, but it's confusing especially in context of benchmarking for an > out-of-order CPU). > > I have shown how different variants have different chains of dependencies in > this email: https://sourceware.org/pipermail/libc-alpha/2022-May/138495.html Agreed, but again the latencies due to that dependency graph may have more or less impact depending on the architecture and eventually is linked to the code schedule, so the difference is academic IMO. > The empty asms are used to prevent compiler reassociating 'h*32 + (h + c)' > to '(h*32 + h) + c' which looks fine in isolation, but significantly changes > the dependency graph in context of the whole loop. > > There's nothing specific to the x86 architecture in this reasoning. On arm and > aarch64 it's moot because they evaluate 'h*32 + h' in a single cycle, though. That may well be true, but there are always architecture quirks to throw one off that may have been missed in testing or may turn up later. Like I conceded before, it may well be that my concerns are unfounded and that gcc will generate the best code for all architectures with those barriers in place but that choice should be explicitly made based on benchmarks. Siddhesh