public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: 'GNU C Library' <libc-alpha@sourceware.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Subject: [PATCH 04/17] Add string vectorized find and detection functions
Date: Sat, 3 Sep 2022 13:13:19 +0000	[thread overview]
Message-ID: <AS4PR08MB79018243EE6ABB25C95FDB5C837D9@AS4PR08MB7901.eurprd08.prod.outlook.com> (raw)

Hi Adhemerval,

+static inline unsigned int
+__clz (op_t x)
+{
+#if !HAVE_BUILTIN_CLZ
+  unsigned r;
+  op_t i;
+
+  x |= x >> 1;
+  x |= x >> 2;
+  x |= x >> 4;
+  x |= x >> 8;
+  x |= x >> 16;
+# if __WORDSIZE == 64
+  x |= x >> 32;
+  i = x * 0x03F79D71B4CB0A89ull >> 58;
+# else
+  i = x * 0x07C4ACDDU >> 27;
+# endif
+  r = index_access (i);
+  return r ^ (sizeof (op_t) * CHAR_BIT - 1);
+#else
+  if (sizeof (op_t) == sizeof (long int))
+    return __builtin_clzl (x);
+  else
+    return __builtin_clzll (x);
+#endif
+}

This is a really bad idea. Firstly it is incorrect - sizeof (op_t) != __WORDSIZE due to
the odd way it is defined (it can be 64 bits on 32-bit targets). That in itself is
problematic since it isn't clear that using 64 bits operations extensively is efficient
on 32-bit targets (using 64-bit multiplies in GMP is different from using 64-bit
load/store in memcpy/memset which is different from 64-bit logical operations and
shifts, so all of these should be decoupled rather than forced together).

Secondly, there are already several ways to use count leading zeroes in GLIBC.
One is use the builtin unconditionally (done in lots of places, eg. by math code),
another is count_leading_zeros defined in longlong.h. This would add the third way. 
It's not clear how much gain inlining gives over using the libgcc implementation,
but if it is significant then we could provide a generic inline clzl/clzll that can be
used throughout GLIBC (replacing existing builtin_clz and count_leading_zeros).

Finally, emulating a full clz is inefficient. If you have already called find_zero_low
then there are at most 4 bits set on a 32-bit LE target, so you can trivially get the
index of the first zero byte via:

x = x & -x;
x = (x >> 15) + (x >> 22) + 3 * (x >> 31);

This is many times faster. There may be similar sequences for big-endian, but
you could just do a multiply with a magic word that gives the correct result
without needing a lookup table.

Cheers,
Wilco

             reply	other threads:[~2022-09-03 13:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-03 13:13 Wilco Dijkstra [this message]
2022-09-19 13:59 ` Adhemerval Zanella Netto
  -- strict thread matches above, loose matches on Subject: below --
2022-09-02 20:39 [PATCH 00/17] Improve generic string routines Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 04/17] Add string vectorized find and detection functions Adhemerval Zanella
2022-09-03  3:20   ` Noah Goldstein
2022-09-19 14:00     ` Adhemerval Zanella Netto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AS4PR08MB79018243EE6ABB25C95FDB5C837D9@AS4PR08MB7901.eurprd08.prod.outlook.com \
    --to=wilco.dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).