From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by sourceware.org (Postfix) with ESMTPS id 30D9F3858D3C for ; Mon, 16 May 2022 19:48:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 30D9F3858D3C Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-2ebf4b91212so165401357b3.8 for ; Mon, 16 May 2022 12:48:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Bvx/DQcn17iBvzQ1IzvEaEGHOUGXk2EC/RcFRxjWiYA=; b=zHhU4e9K3lUwshCkU5IhtzBJcwlbexN2tSkGIrbzMJ5JNy8L7PYkYrCy4eziVyP1hV xReSY7QmsDrz1W4UppTkvdzMV104aCzLmzPdknVAk+x8rv82pumkToDarQ3KP2VTpj2j X9vCyQd8smL88lt9i9tPOvOx2Ukt9/hyVhM2tiPyTpsD67birrt2+xfS+I2LzWPBF7Tl xWtPgbBgTgnKDjbLWg8vbxRHrG9ocAw4vaczq7EX+yAqXvks0tcqwJ9J0f+G5nE7VuWe ObyCCLHK7hOSAsN6Ks/BmKXxyeFcFmfHMyc9pNhVEMGj7YL3XFrZIm60in8kAL6Jj1kI 2q3w== X-Gm-Message-State: AOAM5305wno5w3KVOTvVcx8LP2qeyy/gIJS0v3Ys3YZLgt3UgMRN9nPG EheS1Gz2pqyKoohbu7HW35n14Tnu+5CPiWvjzR6fL3bJ X-Google-Smtp-Source: ABdhPJyAUssj77aHHYUz9JlEy/RGiGzN/9Xx3f0P7GoYv/8ImmACSnoyiVN5+0z/yUF2gXEBOnPP6OpvLAZyombORW4= X-Received: by 2002:a81:25d3:0:b0:2ff:111c:a5ce with SMTP id l202-20020a8125d3000000b002ff111ca5cemr4068508ywl.372.1652730493677; Mon, 16 May 2022 12:48:13 -0700 (PDT) MIME-Version: 1.0 References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220511030635.154689-1-goldstein.w.n@gmail.com> <20220511030635.154689-6-goldstein.w.n@gmail.com> <1b419b02-0dee-813b-de4c-1fdc0779174a@gotplt.org> <1016566-92e6-5aed-b757-c6fdafa68ae@ispras.ru> <0cd799bb-5a54-cd71-ca97-58cc62480b4f@gotplt.org> <4cb8e190-db42-8284-2237-2d82537f593@ispras.ru> In-Reply-To: From: Noah Goldstein Date: Mon, 16 May 2022 14:48:03 -0500 Message-ID: Subject: Re: [PATCH v8 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h To: Alexander Monakov Cc: Siddhesh Poyarekar , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 19:48:16 -0000 On Mon, May 16, 2022 at 2:41 PM Alexander Monakov wrote: > > On Mon, 16 May 2022, Noah Goldstein wrote: > > > > The empty asms are used to prevent compiler reassociating 'h*32 + (h + c)' > > > to '(h*32 + h) + c' which looks fine in isolation, but significantly changes > > > the dependency graph in context of the whole loop. > > > > Some architecture could have a really fast integer MADD instruction that > > the barrier could either prevent from being emitted or add an extra ADD > > instruction at the end of. > > With the barrier I'd expect a shift-by-5 and two additions, no madd. Modern > aarch64 cores have 3-cycle madd I believe, so it's 3 cycles if the compiler > decides to emit madd vs. 2 cycles if it's only additions. AFAIK the shift-by-5 + 2x adds is the best that exists at the moment but the point is some arch we either arent thinking about or some future variant may implement a leq 2 cycle madd. > > Alexander