From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amonakov@ispras.ru>
Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84])
 by sourceware.org (Postfix) with ESMTPS id 4FA5F3856260
 for <libc-alpha@sourceware.org>; Thu,  5 May 2022 11:08:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4FA5F3856260
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=ispras.ru
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru
Received: from [10.10.3.121] (unknown [10.10.3.121])
 by mail.ispras.ru (Postfix) with ESMTPS id 8C7E740D4004;
 Thu,  5 May 2022 11:07:52 +0000 (UTC)
Date: Thu, 5 May 2022 14:07:52 +0300 (MSK)
From: Alexander Monakov <amonakov@ispras.ru>
To: Noah Goldstein <goldstein.w.n@gmail.com>
cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v3 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h
In-Reply-To: <dc462cf-8f1d-e7c8-8a7f-b5face4ed25@ispras.ru>
Message-ID: <741653a-20e0-6dc4-d5ce-c826677cf36@ispras.ru>
References: <20220414041231.926415-1-goldstein.w.n@gmail.com>
 <20220425163601.3670626-1-goldstein.w.n@gmail.com>
 <20220425163601.3670626-6-goldstein.w.n@gmail.com>
 <991f2272-4ae6-b06b-ceb7-184e5d369118@ispras.ru>
 <CAFUsyfKeocq4VAusvnggq-NR=tOQTjrD0Z6r3CYCTjGQ=tGGSw@mail.gmail.com>
 <f54f1ec9-fc31-283f-bce9-59fd8bda98ad@ispras.ru>
 <CAFUsyfJS7NyL5cx9O3whMxg2+mbpyJAeTTkhCU3D20uAJzHvAQ@mail.gmail.com>
 <6d61305f-58-705b-bfb8-cfa243c41d3e@ispras.ru>
 <dc462cf-8f1d-e7c8-8a7f-b5face4ed25@ispras.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2022 11:08:01 -0000

On Wed, 4 May 2022, Alexander Monakov wrote:
> Tigerlake belongs to Intel CPU generations that don't perform move
> elimination for general registers (along with Icelake and Sandybridge).
> So register copy in preparation to shift 'h' by 5 costs an extra cycle
> on the critical path.
> 
> Furthermore, h*33 + *c gets evaluated as '((h + h*32) + *c) instead of
> '((h + *c) + h*32)', which prevents interleaving the additions and thus
> puts one extra add on the critical path.
> 
> (gcc-11 gets this right if you assist it by writing 'h = h + *c + h*32')
> 
> So due to the above issues, on Tigerlake you get 4 cycles for the original
> loop, and also 4 cycles for the modified two-at-a-time loop.

The following variant of the original loop avoids those issues and
should run close to 2 cycles per iteration on most CPUs:

static uint32_t
_dl_new_hash (const char *s)
{
  uint32_t h = 5381, c;
  const unsigned char *us = (const void *)s;
  while ((c = *us++))
    {
      c += h;
      asm("" : "+r"(h) : "r"(c));
      h = h * 32 + c;
    }
  return h;
}

Alexander