Re: [PATCH v2] x86_64: Optimize ffsll function code size.

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Sunil Pandey <skpgkp2@gmail.com>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org, hjl.tools@gmail.com
Subject: Re: [PATCH v2] x86_64: Optimize ffsll function code size.
Date: Mon, 31 Jul 2023 16:44:02 -0700	[thread overview]
Message-ID: <CAMAf5_cnjr3x2tTH9+LVmeAweyp7EAqUhgd8429Qv+UEAUWeAQ@mail.gmail.com> (raw)
In-Reply-To: <9608ac95-a7d5-7963-e4f8-5dc7b0247d82@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 3689 bytes --]

On Mon, Jul 31, 2023 at 3:57 PM Adhemerval Zanella Netto <
adhemerval.zanella@linaro.org> wrote:

>
>
> On 31/07/23 17:58, Sunil Pandey wrote:
> >
> >
> > On Mon, Jul 31, 2023 at 1:12 PM Adhemerval Zanella Netto <
> adhemerval.zanella@linaro.org <mailto:adhemerval.zanella@linaro.org>>
> wrote:
> >
> >
> >
> >     On 31/07/23 15:35, Sunil K Pandey via Libc-alpha wrote:
> >     > Ffsll function size is 17 byte, this patch optimizes size to 16
> byte.
> >     > Currently ffsll function randomly regress by ~20%, depending on how
> >     > code get aligned.
> >     >
> >     > This patch fixes ffsll function random performance regression.
> >     >
> >     > Changes from v1:
> >     > - Further reduce size ffsll function size to 12 bytes.
> >     > ---
> >     >  sysdeps/x86_64/ffsll.c | 10 +++++-----
> >     >  1 file changed, 5 insertions(+), 5 deletions(-)
> >     >
> >     > diff --git a/sysdeps/x86_64/ffsll.c b/sysdeps/x86_64/ffsll.c
> >     > index a1c13d4906..6a5803c7c1 100644
> >     > --- a/sysdeps/x86_64/ffsll.c
> >     > +++ b/sysdeps/x86_64/ffsll.c
> >     > @@ -26,13 +26,13 @@ int
> >     >  ffsll (long long int x)
> >     >  {
> >     >    long long int cnt;
> >     > -  long long int tmp;
> >     >
> >     > -  asm ("bsfq %2,%0\n"                /* Count low bits in X and
> store in %1.  */
> >     > -       "cmoveq %1,%0\n"              /* If number was zero, use
> -1 as result.  */
> >     > -       : "=&r" (cnt), "=r" (tmp) : "rm" (x), "1" (-1));
> >     > +  asm ("mov $-1,%k0\n"       /* Intialize CNT to -1.  */
> >     > +       "bsf %1,%0\n" /* Count low bits in X and store in CNT.  */
> >     > +       "inc %k0\n"   /* Increment CNT by 1.  */
> >     > +       : "=&r" (cnt) : "r" (x));
> >     >
> >     > -  return cnt + 1;
> >     > +  return cnt;
> >     >  }
> >     >
> >     >  #ifndef __ILP32__
> >
> >
> >
> >     I still prefer if we can just remove this arch-optimized function in
> favor
> >     in compiler builtins.
> >
> >
> > Sure, compiler builtin should replace it in the long run.
> > In the meantime, can it get fixed?
>
> This fix only works if compiler does not insert anything in the prologue,
> if
> you use CET or stack protector strong it might not work.  And I *really*
> do not want to add another assembly optimization to a symbol that won't
> be used in most real programs.
>

v2 will fix it, as CET overhead is 4 byte and the latest code size is only
12 byte.
So toal code size will be 16 byte even if CET gets enabled.


> And already have a fix to use compiler builtins [1].
>

Here is code generated from the builtin patch.

(gdb) b ffsll
Breakpoint 1 at 0x10a0
(gdb) run --direct
Starting program: benchtests/bench-ffsll --direct
Breakpoint 1, __ffsll (i=66900473708975203) at ffsll.c:30
30  return __builtin_ffsll (i);
(gdb) disass
Dump of assembler code for function __ffsll:
=> 0x00007ffff7c955a0 <+0>: bsf    %rdi,%rdi
   0x00007ffff7c955a4 <+4>: mov    $0xffffffffffffffff,%rax
   0x00007ffff7c955ab <+11>: cmove  %rax,%rdi
   0x00007ffff7c955af <+15>: lea    0x1(%rdi),%eax
   0x00007ffff7c955b2 <+18>: ret
End of assembler dump.
(gdb)

It is not going to fix the problem. Random 20% variation will continue even
with
builtin patch in benchmarking.

I do not see anything wrong using architecture features, if it helps
people save their valuable debugging time. After all, its valuable
glibc feature to override generic implementation with architecture specific
code.


> [1]
> https://patchwork.sourceware.org/project/glibc/patch/20230717143431.2075924-1-adhemerval.zanella@linaro.org/
>

next prev parent reply	other threads:[~2023-07-31 23:44 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26 16:05 [PATCH] " Sunil K Pandey
2023-07-26 16:38 ` Richard Henderson
2023-07-26 16:50   ` H.J. Lu
2023-07-26 16:51   ` Noah Goldstein
2023-07-26 16:51   ` Sunil Pandey
2023-07-26 16:59     ` Noah Goldstein
2023-07-26 17:11       ` Adhemerval Zanella Netto
2023-07-27  0:31         ` Cristian Rodríguez
2023-07-26 20:43       ` Sunil Pandey
2023-07-26 21:05         ` Noah Goldstein
2023-07-26 22:37           ` Sunil Pandey
2023-07-27  0:00             ` Noah Goldstein
2023-07-27  8:16               ` Florian Weimer
2023-07-27 11:46                 ` Alexander Monakov
2023-07-27 12:10                   ` Florian Weimer
2023-07-27 13:59                     ` Cristian Rodríguez
2023-07-27 14:00                     ` Alexander Monakov
2023-07-27 15:13                       ` Sunil Pandey
2023-07-27 15:50                         ` Alexander Monakov
2023-07-27 16:24                         ` Florian Weimer
2023-07-27 16:35                           ` Adhemerval Zanella Netto
2023-07-27 17:09                             ` Florian Weimer
2023-07-27 17:25                               ` Sunil Pandey
2023-07-31 18:35                                 ` [PATCH v2] " Sunil K Pandey
2023-07-31 20:12                                   ` Adhemerval Zanella Netto
2023-07-31 20:58                                     ` Sunil Pandey
2023-07-31 22:57                                       ` Adhemerval Zanella Netto
2023-07-31 23:44                                         ` Sunil Pandey [this message]
2023-07-31 23:54                                           ` Noah Goldstein
2023-08-01  6:47                                             ` Andreas Schwab
2023-08-01 13:46                                           ` Adhemerval Zanella Netto
2023-08-01 18:25                                             ` Cristian Rodríguez
2024-01-10 19:19                                   ` Carlos O'Donell
2024-01-25  3:10                                     ` Sunil Pandey
2023-07-27 16:40                           ` [PATCH] " Sunil Pandey
2023-07-26 17:04 ` Noah Goldstein
2023-07-26 17:25   ` Andrew Pinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMAf5_cnjr3x2tTH9+LVmeAweyp7EAqUhgd8429Qv+UEAUWeAQ@mail.gmail.com \
    --to=skpgkp2@gmail.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).