From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id 7A6103858D28 for ; Mon, 31 Jul 2023 22:57:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A6103858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1bb5dac1df4so3948612fac.3 for ; Mon, 31 Jul 2023 15:57:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1690844263; x=1691449063; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=7VwvOC2hBJAcY+4EcJvVrQ0/SsTV83gWdE40derKXi8=; b=HS+fOmOFdoyNVDuiMSHB4gbmUyhzywwXIY1k0hS1qxyu6HWkexblQ5FU/tW1wdyd/V rmzHH3UdAC/daX4190QxlSSjfunSqsNCjC3pP6Mu70EFv54H4dpXmbBZPi+jOcOoId10 s7ROve5xhiQHUkX+sgW5qwlRjzDp0EGurJsflIdftPDO6gZQZhcX/8rKmESqU6oMgdOz Xcjm4FhIlnazRxS4NHxnmBnIHlkUbVFPfCwGBmKp/rC6u3Z+I+1q2Y9DlyWZ4T9OeAfL WZz2lx3jB3yNQJ6YL9UG3gfsDD5H6FBf7ljgdlG60poDOX0UK0MB/H1YwSrjaJtrHbdW Trbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690844263; x=1691449063; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7VwvOC2hBJAcY+4EcJvVrQ0/SsTV83gWdE40derKXi8=; b=QeIjRe7PpK0BlpaeUaAvKp0m9yqo5S0QM7BTJVHs0U1jSeSiHziD3rwmaOuiS5jYnx 48El+Kb8k/0DQxkJEczU8+wnDahHgYB5KCEcHftOWDD+hed3iG2/PlVGye8gfvHDjYmF Rf7FOwUYjTJN/pJNF0fYqcoYD+CmnraTEjUkAzvjst7+4WhyhPBIXjIQCXsIHOhpkXah eWEaq9CI3Vy0xNbU0unevHsnr+3ch0N79foMDY6UlkrlugohB/4HTIRLeZdtq++Nd+f/ xWZR2qH51EC050HcggwwAwU7aW7zitXv7D7yO6W2wiUfPnQDUTXHsyXt4o7+a0bB5bo7 aSWw== X-Gm-Message-State: ABy/qLZpNUL+7IeUUlUCEDYRHWzCaBHbYBpPhOhYUsf0FBcLSUkuf6ia U9DUniJT7bQINaa7OeNHxOhVZQ== X-Google-Smtp-Source: APBJJlGY25m61GgJ7ZfF4CrEQJiWw+ZJmQDFBzmnBZJTiqruUzewv40ODWey/zWB6wCUaIbrDVty7g== X-Received: by 2002:a05:6870:58a6:b0:1b0:db2:189e with SMTP id be38-20020a05687058a600b001b00db2189emr12391385oab.17.1690844263190; Mon, 31 Jul 2023 15:57:43 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:440b:a49a:e567:9a27:3db8? ([2804:1b3:a7c1:440b:a49a:e567:9a27:3db8]) by smtp.gmail.com with ESMTPSA id u2-20020a05687004c200b001bb015f61fdsm4988870oam.30.2023.07.31.15.57.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 15:57:42 -0700 (PDT) Message-ID: <9608ac95-a7d5-7963-e4f8-5dc7b0247d82@linaro.org> Date: Mon, 31 Jul 2023 19:57:40 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v2] x86_64: Optimize ffsll function code size. Content-Language: en-US To: Sunil Pandey Cc: libc-alpha@sourceware.org, hjl.tools@gmail.com References: <20230731183549.2396362-1-skpgkp2@gmail.com> <587651e8-c4e1-7d83-76fa-7395f68e457f@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 31/07/23 17:58, Sunil Pandey wrote: > > > On Mon, Jul 31, 2023 at 1:12 PM Adhemerval Zanella Netto > wrote: > > > > On 31/07/23 15:35, Sunil K Pandey via Libc-alpha wrote: > > Ffsll function size is 17 byte, this patch optimizes size to 16 byte. > > Currently ffsll function randomly regress by ~20%, depending on how > > code get aligned. > > > > This patch fixes ffsll function random performance regression. > > > > Changes from v1: > > - Further reduce size ffsll function size to 12 bytes. > > --- > >  sysdeps/x86_64/ffsll.c | 10 +++++----- > >  1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/sysdeps/x86_64/ffsll.c b/sysdeps/x86_64/ffsll.c > > index a1c13d4906..6a5803c7c1 100644 > > --- a/sysdeps/x86_64/ffsll.c > > +++ b/sysdeps/x86_64/ffsll.c > > @@ -26,13 +26,13 @@ int > >  ffsll (long long int x) > >  { > >    long long int cnt; > > -  long long int tmp; > >  > > -  asm ("bsfq %2,%0\n"                /* Count low bits in X and store in %1.  */ > > -       "cmoveq %1,%0\n"              /* If number was zero, use -1 as result.  */ > > -       : "=&r" (cnt), "=r" (tmp) : "rm" (x), "1" (-1)); > > +  asm ("mov $-1,%k0\n"       /* Intialize CNT to -1.  */ > > +       "bsf %1,%0\n" /* Count low bits in X and store in CNT.  */ > > +       "inc %k0\n"   /* Increment CNT by 1.  */ > > +       : "=&r" (cnt) : "r" (x)); > >  > > -  return cnt + 1; > > +  return cnt; > >  } > >  > >  #ifndef __ILP32__ > > > > I still prefer if we can just remove this arch-optimized function in favor > in compiler builtins. > > > Sure, compiler builtin should replace it in the long run. > In the meantime, can it get fixed?  This fix only works if compiler does not insert anything in the prologue, if you use CET or stack protector strong it might not work. And I *really* do not want to add another assembly optimization to a symbol that won't be used in most real programs. And already have a fix to use compiler builtins [1]. [1] https://patchwork.sourceware.org/project/glibc/patch/20230717143431.2075924-1-adhemerval.zanella@linaro.org/