From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id 960243858D28 for ; Mon, 31 Jul 2023 23:44:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 960243858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-3179ed1dfbbso1734623f8f.1 for ; Mon, 31 Jul 2023 16:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690847079; x=1691451879; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RQvoMC5ChRxA1OzMdquNaV3u/cR7KEbHlOyRwbTLT3w=; b=hsSW3r7FbkV+1m4tfIf1rhj1zRxRkc7G+rhvYh2QPLvz+KQgU3WJz0aM2qArahRNFg h0W+Op6bYubf5uw2gRsL58YkKOfk3y8fZ6wNV9wn0opAifFZp8IjF+AYf/zbHDg29Fgo fLMEOfW8EigYqNI8/Cc9vaNOZ4RD0mCtaAPZO/VEpcguj1EdNmQtSgIbM+uLiQJ9MKBX HhblsfnTTSkuQWwFg+D9T6mshr+PHZDxFAnxrpRIctSKNA6E1C6bg2Bx2ZADoxHMqkJ2 hJgVbjqE9ETkN8x1FnoQfgvbvNL06ZbrwgzyTHeppqHIptxxw9bSbHJGo9UclwryrDEJ iORQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690847079; x=1691451879; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RQvoMC5ChRxA1OzMdquNaV3u/cR7KEbHlOyRwbTLT3w=; b=Z/dQky4dHlJqu9w50HkD1cSuy2LLq2hACoginjp8kfmrvj6bxl8IMwnVklEnUImZhB pTHd4ojLuhSh/8B4wOjMejGYFBMLGUl6LjYet3euwCcxJhErntjAoJGTZT0VtCEH7y76 RT9p6JrFopARu2ppvOjLI4B3qXLAzNI/T13MpMHWUvUdmmVU3j3FZ4X3V/Ix1OI/iGKA antf4gBjJJOZTeI8yqq6nk6Gi729SVEjiod/VpFbjFabpk1OWdkCloM6fmqaKkJGjJA+ cezWWmrSgLoHFZiZSafgf8hQG1xGCW5bi3YqqZt3Ho/kSxmKwKm6aRYFoqLESu7U4LDB Ffiw== X-Gm-Message-State: ABy/qLaWPtUErlj+TrIA1US4WrYZtvS9hCEVCx9FjKBppVv8n8eA57L3 s9sMPPsqJ0DJVv3/+l8F2m5gYqQtRMrDlmPXhZCZ7uuescA= X-Google-Smtp-Source: APBJJlHti/F7xlApuDDYhmdBvp8VhwCeGbd5RGNx55n9szz5ETEjpKzKTUouWNTj83P/fnb4gyFscwJviKXEg4aFwbM= X-Received: by 2002:a5d:474c:0:b0:317:631b:43ce with SMTP id o12-20020a5d474c000000b00317631b43cemr907381wrs.41.1690847078681; Mon, 31 Jul 2023 16:44:38 -0700 (PDT) MIME-Version: 1.0 References: <20230731183549.2396362-1-skpgkp2@gmail.com> <587651e8-c4e1-7d83-76fa-7395f68e457f@linaro.org> <9608ac95-a7d5-7963-e4f8-5dc7b0247d82@linaro.org> In-Reply-To: <9608ac95-a7d5-7963-e4f8-5dc7b0247d82@linaro.org> From: Sunil Pandey Date: Mon, 31 Jul 2023 16:44:02 -0700 Message-ID: Subject: Re: [PATCH v2] x86_64: Optimize ffsll function code size. To: Adhemerval Zanella Netto Cc: libc-alpha@sourceware.org, hjl.tools@gmail.com Content-Type: multipart/alternative; boundary="0000000000007e8f740601d1034d" X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000007e8f740601d1034d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jul 31, 2023 at 3:57=E2=80=AFPM Adhemerval Zanella Netto < adhemerval.zanella@linaro.org> wrote: > > > On 31/07/23 17:58, Sunil Pandey wrote: > > > > > > On Mon, Jul 31, 2023 at 1:12=E2=80=AFPM Adhemerval Zanella Netto < > adhemerval.zanella@linaro.org > > wrote: > > > > > > > > On 31/07/23 15:35, Sunil K Pandey via Libc-alpha wrote: > > > Ffsll function size is 17 byte, this patch optimizes size to 16 > byte. > > > Currently ffsll function randomly regress by ~20%, depending on h= ow > > > code get aligned. > > > > > > This patch fixes ffsll function random performance regression. > > > > > > Changes from v1: > > > - Further reduce size ffsll function size to 12 bytes. > > > --- > > > sysdeps/x86_64/ffsll.c | 10 +++++----- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/sysdeps/x86_64/ffsll.c b/sysdeps/x86_64/ffsll.c > > > index a1c13d4906..6a5803c7c1 100644 > > > --- a/sysdeps/x86_64/ffsll.c > > > +++ b/sysdeps/x86_64/ffsll.c > > > @@ -26,13 +26,13 @@ int > > > ffsll (long long int x) > > > { > > > long long int cnt; > > > - long long int tmp; > > > > > > - asm ("bsfq %2,%0\n" /* Count low bits in X and > store in %1. */ > > > - "cmoveq %1,%0\n" /* If number was zero, use > -1 as result. */ > > > - : "=3D&r" (cnt), "=3Dr" (tmp) : "rm" (x), "1" (-1)); > > > + asm ("mov $-1,%k0\n" /* Intialize CNT to -1. */ > > > + "bsf %1,%0\n" /* Count low bits in X and store in CNT. */ > > > + "inc %k0\n" /* Increment CNT by 1. */ > > > + : "=3D&r" (cnt) : "r" (x)); > > > > > > - return cnt + 1; > > > + return cnt; > > > } > > > > > > #ifndef __ILP32__ > > > > > > > > I still prefer if we can just remove this arch-optimized function in > favor > > in compiler builtins. > > > > > > Sure, compiler builtin should replace it in the long run. > > In the meantime, can it get fixed? > > This fix only works if compiler does not insert anything in the prologue, > if > you use CET or stack protector strong it might not work. And I *really* > do not want to add another assembly optimization to a symbol that won't > be used in most real programs. > v2 will fix it, as CET overhead is 4 byte and the latest code size is only 12 byte. So toal code size will be 16 byte even if CET gets enabled. > And already have a fix to use compiler builtins [1]. > Here is code generated from the builtin patch. (gdb) b ffsll Breakpoint 1 at 0x10a0 (gdb) run --direct Starting program: benchtests/bench-ffsll --direct Breakpoint 1, __ffsll (i=3D66900473708975203) at ffsll.c:30 30 return __builtin_ffsll (i); (gdb) disass Dump of assembler code for function __ffsll: =3D> 0x00007ffff7c955a0 <+0>: bsf %rdi,%rdi 0x00007ffff7c955a4 <+4>: mov $0xffffffffffffffff,%rax 0x00007ffff7c955ab <+11>: cmove %rax,%rdi 0x00007ffff7c955af <+15>: lea 0x1(%rdi),%eax 0x00007ffff7c955b2 <+18>: ret End of assembler dump. (gdb) It is not going to fix the problem. Random 20% variation will continue even with builtin patch in benchmarking. I do not see anything wrong using architecture features, if it helps people save their valuable debugging time. After all, its valuable glibc feature to override generic implementation with architecture specific code. > [1] > https://patchwork.sourceware.org/project/glibc/patch/20230717143431.20759= 24-1-adhemerval.zanella@linaro.org/ > --0000000000007e8f740601d1034d--