public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Add arc4random support
@ 2022-04-19 21:28 Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

This patch adds the arc4random, arc4random_buf, and arc4random_uniform
along with optimized versions for x86_64, aarch64, powerpc64, and
s390x.

The generic implementation is based on scalar Chacha20, with a per
thread state cache allocated in TCB.  The internal state keeps a 256
bytes buffer (8 ChaCha20 blocks) plus the cipher state, which allows
to better use the vectorized optimized version.  It would be possible
to use just 128 bytes, but it would require to rewrite the AVX2
optimization (and possible it would lower performance slight).

The initial state and reseed uses getrandom or /dev/urandom as
fallback and reseeds the internal state on every 16MB of consumed
entropy.

There is no fork detection, the internal state is reset only at the
atfork handler.  It does not handle direct clone calls, nor vfork or
_Fork.

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

The generic ChaCha20 implementation is based on the RFC8439 [1] without
the last XOR step.   Since the input stream will either zero bytes
(initial state) or the PRNG output itself this step does not add any
extra entropy.

The optimized ChaCha20 implementations for x86_64, aarch64, powerpc64,
and s390x use vectorized instruction and they are based on libgcrypt
code.

ChaCha20 is used because is the standard cipher used on different
arc4random implementation (BSDs, MacOSX), and recently on Linux random
subsystem.  It also offers a very cheap rekey, which uses periodically
uses kernel entropy to improve randomness;  it is also simpler than AES,
and shows better performance when no specialized instructions are present.

[1] https://sourceware.org/pipermail/libc-alpha/2018-June/094879.html

v3:
* Add per-thread cache to remove the lock usage.  It should improve both
  performance and scalability.
* Improve benchmark precision.
* Fixed Hurd test build.

v2:
* Removed the last XOR operation on ChaCha20 implementation (it does
  not much on arc4random usage).
* Add tst-arc4random-chacha20.c and refactor to check against the
  expected implementation.
* Fixed aarch64 implementation (a last change to move symbols to hidden
  did not change the relocation to use it as well).
* Refactor x86 SSSE3 to SSE2.
* Fixed powerpc64 implementation on BE (use the correct macro to check
  for endianess instead the ones from libgcrpyt).
* Add s390x optimized ChaCha20 implementation.


Adhemerval Zanella (9):
  stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ
    #4417)
  stdlib: Add arc4random tests
  benchtests: Add arc4random benchtest
  aarch64: Add optimized chacha20
  x86: Add SSE2 optimized chacha20
  x86: Add AVX2 optimized chacha20
  powerpc64: Add optimized chacha20
  s390x: Add optimized chacha20
  stdlib: Add TLS optimization to arc4random

 LICENSES                                      |  22 +
 NEWS                                          |   4 +-
 benchtests/Makefile                           |   6 +-
 benchtests/bench-arc4random.c                 | 224 +++++++
 include/stdlib.h                              |  13 +
 nptl/allocatestack.c                          |   5 +-
 posix/fork.c                                  |   2 +
 stdlib/Makefile                               |   9 +
 stdlib/Versions                               |   5 +
 stdlib/arc4random.c                           | 178 ++++++
 stdlib/arc4random.h                           |  45 ++
 stdlib/arc4random_uniform.c                   | 148 +++++
 stdlib/chacha20.c                             | 164 +++++
 stdlib/stdlib.h                               |  14 +
 stdlib/tst-arc4random-chacha20.c              | 166 ++++++
 stdlib/tst-arc4random-fork.c                  | 174 ++++++
 stdlib/tst-arc4random-stats.c                 | 146 +++++
 stdlib/tst-arc4random-thread.c                | 278 +++++++++
 sysdeps/aarch64/Makefile                      |   4 +
 sysdeps/aarch64/chacha20-neon.S               | 323 ++++++++++
 sysdeps/aarch64/chacha20_arch.h               |  40 ++
 sysdeps/generic/chacha20_arch.h               |  24 +
 sysdeps/generic/not-cancel.h                  |   2 +
 sysdeps/generic/tls-internal-struct.h         |   3 +
 sysdeps/mach/hurd/i386/libc.abilist           |   3 +
 sysdeps/mach/hurd/not-cancel.h                |   3 +
 sysdeps/powerpc/powerpc64/Makefile            |   3 +
 sysdeps/powerpc/powerpc64/chacha20-ppc.c      | 236 ++++++++
 sysdeps/powerpc/powerpc64/chacha20_arch.h     |  47 ++
 sysdeps/s390/s390-64/Makefile                 |   4 +
 sysdeps/s390/s390-64/chacha20-vx.S            | 564 ++++++++++++++++++
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 ++
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   3 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   3 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   3 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   3 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   3 +
 .../sysv/linux/microblaze/be/libc.abilist     |   3 +
 .../sysv/linux/microblaze/le/libc.abilist     |   3 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   3 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   3 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   3 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/not-cancel.h          |   7 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   3 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   3 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   3 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   3 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   3 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   3 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   3 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   3 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   3 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   3 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   3 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/tls-internal.h        |  27 +-
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   3 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   3 +
 sysdeps/x86_64/Makefile                       |   7 +
 sysdeps/x86_64/chacha20-avx2.S                | 313 ++++++++++
 sysdeps/x86_64/chacha20-sse2.S                | 311 ++++++++++
 sysdeps/x86_64/chacha20_arch.h                |  48 ++
 71 files changed, 3711 insertions(+), 5 deletions(-)
 create mode 100644 benchtests/bench-arc4random.c
 create mode 100644 stdlib/arc4random.c
 create mode 100644 stdlib/arc4random.h
 create mode 100644 stdlib/arc4random_uniform.c
 create mode 100644 stdlib/chacha20.c
 create mode 100644 stdlib/tst-arc4random-chacha20.c
 create mode 100644 stdlib/tst-arc4random-fork.c
 create mode 100644 stdlib/tst-arc4random-stats.c
 create mode 100644 stdlib/tst-arc4random-thread.c
 create mode 100644 sysdeps/aarch64/chacha20-neon.S
 create mode 100644 sysdeps/aarch64/chacha20_arch.h
 create mode 100644 sysdeps/generic/chacha20_arch.h
 create mode 100644 sysdeps/powerpc/powerpc64/chacha20-ppc.c
 create mode 100644 sysdeps/powerpc/powerpc64/chacha20_arch.h
 create mode 100644 sysdeps/s390/s390-64/chacha20-vx.S
 create mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 create mode 100644 sysdeps/x86_64/chacha20-avx2.S
 create mode 100644 sysdeps/x86_64/chacha20-sse2.S
 create mode 100644 sysdeps/x86_64/chacha20_arch.h

-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:52   ` H.J. Lu
                     ` (2 more replies)
  2022-04-19 21:28 ` [PATCH v3 2/9] stdlib: Add arc4random tests Adhemerval Zanella
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

The implementation is based on scalar Chacha20, with global cache and
locking.  It uses getrandom or /dev/urandom as fallback to get the
initial entropy, and reseeds the internal state on every 16MB of
consumed buffer.

It maintains an internal buffer which consumes at maximum one page on
most systems (assuming minimum of 4k pages).  The internal buf optimizes
the cipher encrypt calls, by amortize arc4random calls (where both
function call and locks cost are the dominating factor).

The ChaCha20 implementation is based on the RFC8439 [1], with last
step that XOR with the input omited.  Since the input stream will either
zero bytes (initial state) or the PRNG output itself this step does not
add any extra entropy.

The arc4random_uniform is based on previous work by Florian Weimer.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>

[1] https://datatracker.ietf.org/doc/html/rfc8439
---
 NEWS                                          |   4 +-
 include/stdlib.h                              |  13 +
 posix/fork.c                                  |   2 +
 stdlib/Makefile                               |   2 +
 stdlib/Versions                               |   5 +
 stdlib/arc4random.c                           | 245 ++++++++++++++++++
 stdlib/arc4random_uniform.c                   | 152 +++++++++++
 stdlib/chacha20.c                             | 163 ++++++++++++
 stdlib/stdlib.h                               |  14 +
 sysdeps/generic/not-cancel.h                  |   2 +
 sysdeps/mach/hurd/i386/libc.abilist           |   3 +
 sysdeps/mach/hurd/not-cancel.h                |   3 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   3 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   3 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   3 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   3 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   3 +
 .../sysv/linux/microblaze/be/libc.abilist     |   3 +
 .../sysv/linux/microblaze/le/libc.abilist     |   3 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   3 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   3 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   3 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/not-cancel.h          |   7 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   3 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   3 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   3 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   3 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   3 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   3 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   3 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   3 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   3 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   3 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   3 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   3 +
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   3 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   3 +
 46 files changed, 713 insertions(+), 1 deletion(-)
 create mode 100644 stdlib/arc4random.c
 create mode 100644 stdlib/arc4random_uniform.c
 create mode 100644 stdlib/chacha20.c

diff --git a/NEWS b/NEWS
index 4b6d9de2b5..4d9d95b35b 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,9 @@ Version 2.36
 
 Major new features:
 
-  [Add new features here]
+* The functions arc4random, arc4random_buf, arc4random_uniform have been
+  added.  The functions use a cryptographic pseudo-random number generator
+  based on ChaCha20 initilized with entropy from kernel.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/include/stdlib.h b/include/stdlib.h
index 1c6f70b082..055f9d2965 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -144,6 +144,19 @@ libc_hidden_proto (__ptsname_r)
 libc_hidden_proto (grantpt)
 libc_hidden_proto (unlockpt)
 
+__typeof (arc4random) __arc4random;
+libc_hidden_proto (__arc4random);
+__typeof (arc4random_buf) __arc4random_buf;
+libc_hidden_proto (__arc4random_buf);
+__typeof (arc4random_uniform) __arc4random_uniform;
+libc_hidden_proto (__arc4random_uniform);
+extern void __arc4random_buf_internal (void *buffer, size_t len)
+     attribute_hidden;
+/* Called from the fork function to reinitialize the internal lock in thte
+   child process.  This avoids deadlocks if fork is called in multi-threaded
+   processes.  */
+extern void __arc4random_fork_subprocess (void) attribute_hidden;
+
 extern double __strtod_internal (const char *__restrict __nptr,
 				 char **__restrict __endptr, int __group)
      __THROW __nonnull ((1)) __wur;
diff --git a/posix/fork.c b/posix/fork.c
index 6b50c091f9..87d8329b46 100644
--- a/posix/fork.c
+++ b/posix/fork.c
@@ -96,6 +96,8 @@ __libc_fork (void)
 				     &nss_database_data);
 	}
 
+      call_function_static_weak (__arc4random_fork_subprocess);
+
       /* Reset the lock the dynamic loader uses to protect its data.  */
       __rtld_lock_initialize (GL(dl_load_lock));
 
diff --git a/stdlib/Makefile b/stdlib/Makefile
index 60fc59c12c..9f9cc1bd7f 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -53,6 +53,8 @@ routines := \
   a64l \
   abort \
   abs \
+  arc4random \
+  arc4random_uniform \
   at_quick_exit \
   atof \
   atoi \
diff --git a/stdlib/Versions b/stdlib/Versions
index 5e9099a153..d09a308fb5 100644
--- a/stdlib/Versions
+++ b/stdlib/Versions
@@ -136,6 +136,11 @@ libc {
     strtof32; strtof64; strtof32x;
     strtof32_l; strtof64_l; strtof32x_l;
   }
+  GLIBC_2.36 {
+    arc4random;
+    arc4random_buf;
+    arc4random_uniform;
+  }
   GLIBC_PRIVATE {
     # functions which have an additional interface since they are
     # are cancelable.
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
new file mode 100644
index 0000000000..cddb0e405a
--- /dev/null
+++ b/stdlib/arc4random.c
@@ -0,0 +1,245 @@
+/* Pseudo Random Number Generator based on ChaCha20.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <libc-lock.h>
+#include <not-cancel.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+#include <sys/param.h>
+#include <sys/random.h>
+
+/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
+   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
+   number of bytes until a reseed.
+
+   Both the initial seed an reseed tries to obtain entropy from the kernel
+   and abort the process if none could be obtained.
+
+   The state 'buf' improves the usage of the cipher call, allowing to call
+   optimized implementations (if the archictecture provides it) and optimize
+   arc4random calls (since only multiple call it will encrypt the next block).
+ */
+
+/* Maximum number bytes until reseed (16 MB).  */
+#define CHACHE_RESEED_SIZE	(16 * 1024 * 1024)
+/* Internal buffer size in bytes (1KB).  */
+#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
+
+#include <chacha20.c>
+
+static struct arc4random_state
+{
+  uint32_t ctx[CHACHA20_STATE_LEN];
+  size_t have;
+  size_t count;
+  uint8_t buf[CHACHA20_BUFSIZE];
+} *state;
+
+/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus
+   it does not require to clear the internal state.  */
+static bool __arc4random_wipeonfork = false;
+
+__libc_lock_define_initialized (, __arc4random_lock);
+
+/* Called from the fork function to reset the state if MADV_WIPEONFORK is
+   not supported and to reinit the internal lock.  */
+void
+__arc4random_fork_subprocess (void)
+{
+  if (__arc4random_wipeonfork && state != NULL)
+    memset (state, 0, sizeof (struct arc4random_state));
+
+  __libc_lock_init (__arc4random_lock);
+}
+
+static void
+arc4random_allocate_failure (void)
+{
+  __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n");
+}
+
+static void
+arc4random_getrandom_failure (void)
+{
+  __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n");
+}
+
+/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
+   the fork callback will reset the state on the fork call.  It does not
+   handle direct clone calls, nor vfork or _Fork (arc4random is not
+   async-signal-safe due the internal lock usage).  */
+static void
+arc4random_init (uint8_t *buf, size_t len)
+{
+  state = __mmap (NULL, sizeof (struct arc4random_state),
+		  PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+  if (state == MAP_FAILED)
+    arc4random_allocate_failure ();
+
+#ifdef MADV_WIPEONFORK
+  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
+  if (r == 0)
+    __arc4random_wipeonfork = true;
+  else if (errno != EINVAL)
+    arc4random_allocate_failure ();
+#endif
+
+  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
+}
+
+#define min(x,y) (((x) > (y)) ? (y) : (x))
+
+static void
+arc4random_rekey (uint8_t *rnd, size_t rndlen)
+{
+  memset (state->buf, 0, sizeof state->buf);
+  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
+
+  /* Mix some extra entropy if provided.  */
+  if (rnd != NULL)
+    {
+      size_t m = min (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
+      for (size_t i = 0; i < m; i++)
+	state->buf[i] ^= rnd[i];
+    }
+
+  /* Immediately reinit for backtracking resistance.  */
+  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
+  memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
+  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
+}
+
+static void
+arc4random_getentropy (uint8_t *rnd, size_t len)
+{
+  if (__getrandomn_nocancel (rnd, len, GRND_NONBLOCK) == len)
+    return;
+
+  int fd = __open64_nocancel ("/dev/urandom", O_RDONLY);
+  if (fd != -1)
+    {
+      unsigned char *p = rnd;
+      unsigned char *end = p + len;
+      do
+	{
+	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
+	  if (ret <= 0)
+	    arc4random_getrandom_failure ();
+	  p += ret;
+	}
+      while (p < end);
+
+      if (__close_nocancel (fd) != 0)
+	return;
+    }
+  arc4random_getrandom_failure ();
+}
+
+/* Either allocates the state buffer or reinit it by reseeding the cipher
+   state with kernel entropy.  */
+static void
+arc4random_stir (void)
+{
+  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
+  arc4random_getentropy (rnd, sizeof rnd);
+
+  if (state == NULL)
+    arc4random_init (rnd, sizeof rnd);
+  else
+    arc4random_rekey (rnd, sizeof rnd);
+
+  explicit_bzero (rnd, sizeof rnd);
+
+  state->have = 0;
+  memset (state->buf, 0, sizeof state->buf);
+  state->count = CHACHE_RESEED_SIZE;
+}
+
+static void
+arc4random_check_stir (size_t len)
+{
+  if (state == NULL || state->count < len)
+    arc4random_stir ();
+  if (state->count <= len)
+    state->count = 0;
+  else
+    state->count -= len;
+}
+
+void
+__arc4random_buf_internal (void *buffer, size_t len)
+{
+  arc4random_check_stir (len);
+
+  while (len > 0)
+    {
+      if (state->have > 0)
+	{
+	  size_t m = min (len, state->have);
+	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
+	  memcpy (buffer, ks, m);
+	  memset (ks, 0, m);
+	  buffer += m;
+	  len -= m;
+	  state->have -= m;
+	}
+      if (state->have == 0)
+	arc4random_rekey (NULL, 0);
+    }
+}
+
+void
+__arc4random_buf (void *buffer, size_t len)
+{
+  __libc_lock_lock (__arc4random_lock);
+  __arc4random_buf_internal (buffer, len);
+  __libc_lock_unlock (__arc4random_lock);
+}
+libc_hidden_def (__arc4random_buf)
+weak_alias (__arc4random_buf, arc4random_buf)
+
+
+static uint32_t
+__arc4random_internal (void)
+{
+  uint32_t r;
+
+  arc4random_check_stir (sizeof (uint32_t));
+  if (state->have < sizeof (uint32_t))
+    arc4random_rekey (NULL, 0);
+  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
+  memcpy (&r, ks, sizeof (uint32_t));
+  memset (ks, 0, sizeof (uint32_t));
+  state->have -= sizeof (uint32_t);
+
+  return r;
+}
+
+uint32_t
+__arc4random (void)
+{
+  uint32_t r;
+  __libc_lock_lock (__arc4random_lock);
+  r = __arc4random_internal ();
+  __libc_lock_unlock (__arc4random_lock);
+  return r;
+}
+libc_hidden_def (__arc4random)
+weak_alias (__arc4random, arc4random)
diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
new file mode 100644
index 0000000000..96ffe62df1
--- /dev/null
+++ b/stdlib/arc4random_uniform.c
@@ -0,0 +1,152 @@
+/* Random pseudo generator numbers between 0 and 2**-31 (inclusive)
+   uniformly distributed but with an upper_bound.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <endian.h>
+#include <libc-lock.h>
+#include <stdlib.h>
+#include <sys/param.h>
+
+/* Return the number of bytes which cover values up to the limit.  */
+__attribute__ ((const))
+static uint32_t
+byte_count (uint32_t n)
+{
+  if (n <= (1U << 8))
+    return 1;
+  else if (n <= (1U << 16))
+    return 2;
+  else if (n <= (1U << 24))
+    return 3;
+  else
+    return 4;
+}
+
+/* Fill the lower bits of the result with randomness, according to the
+   number of bytes requested.  */
+static void
+random_bytes (uint32_t *result, uint32_t byte_count)
+{
+  *result = 0;
+  unsigned char *ptr = (unsigned char *) result;
+  if (__BYTE_ORDER == __BIG_ENDIAN)
+    ptr += 4 - byte_count;
+  __arc4random_buf_internal (ptr, byte_count);
+}
+
+static uint32_t
+compute_uniform (uint32_t n)
+{
+  if (n <= 1)
+    /* There is no valid return value for a zero limit, and 0 is the
+       only possible result for limit 1.  */
+    return 0;
+
+  /* The bits variable serves as a source for bits.  Prefetch the
+     minimum number of bytes needed.  */
+  unsigned count = byte_count (n);
+  uint32_t bits_length = count * CHAR_BIT;
+  uint32_t bits;
+  random_bytes (&bits, count);
+
+  /* Powers of two are easy.  */
+  if (powerof2 (n))
+    return bits & (n - 1);
+
+  /* The general case.  This algorithm follows Jérémie Lumbroso,
+     Optimal Discrete Uniform Generation from Coin Flips, and
+     Applications (2013), who credits Donald E. Knuth and Andrew
+     C. Yao, The complexity of nonuniform random number generation
+     (1976), for solving the general case.
+
+     The implementation below unrolls the initialization stage of the
+     loop, where v is less than n.  */
+
+  /* Use 64-bit variables even though the intermediate results are
+     never larger that 33 bits.  This ensures the code easier to
+     compile on 64-bit architectures.  */
+  uint64_t v;
+  uint64_t c;
+
+  /* Initialize v and c.  v is the smallest power of 2 which is larger
+     than n.*/
+  {
+    uint32_t log2p1 = 32 - __builtin_clz (n);
+    v = 1ULL << log2p1;
+    c = bits & (v - 1);
+    bits >>= log2p1;
+    bits_length -= log2p1;
+  }
+
+  /* At the start of the loop, c is uniformly distributed within the
+     half-open interval [0, v), and v < 2n < 2**33.  */
+  while (true)
+    {
+      if (v >= n)
+        {
+          /* If the candidate is less than n, accept it.  */
+          if (c < n)
+            /* c is uniformly distributed on [0, n).  */
+            return c;
+          else
+            {
+              /* c is uniformly distributed on [n, v).  */
+              v -= n;
+              c -= n;
+              /* The distribution was shifted, so c is uniformly
+                 distributed on [0, v) again.  */
+            }
+        }
+      /* v < n here.  */
+
+      /* Replenish the bit source if necessary.  */
+      if (bits_length == 0)
+        {
+          /* Overwrite the least significant byte.  */
+	  random_bytes (&bits, 1);
+	  bits_length = CHAR_BIT;
+        }
+
+      /* Double the range.  No overflow because v < n < 2**32.  */
+      v *= 2;
+      /* v < 2n here.  */
+
+      /* Extract a bit and append it to c.  c remains less than v and
+         thus 2**33.  */
+      c = (c << 1) | (bits & 1);
+      bits >>= 1;
+      --bits_length;
+
+      /* At this point, c is uniformly distributed on [0, v) again,
+         and v < 2n < 2**33.  */
+    }
+}
+
+__libc_lock_define (extern , __arc4random_lock attribute_hidden)
+
+uint32_t
+__arc4random_uniform (uint32_t upper_bound)
+{
+  uint32_t r;
+  __libc_lock_lock (__arc4random_lock);
+  r = compute_uniform (upper_bound);
+  __libc_lock_unlock (__arc4random_lock);
+  return r;
+}
+libc_hidden_def (__arc4random_uniform)
+weak_alias (__arc4random_uniform, arc4random_uniform)
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
new file mode 100644
index 0000000000..af4ffa9860
--- /dev/null
+++ b/stdlib/chacha20.c
@@ -0,0 +1,163 @@
+/* Generic ChaCha20 implementation (used on arc4random).
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <endian.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* 32-bit stream position, then 96-bit nonce.  */
+#define CHACHA20_IV_SIZE	16
+#define CHACHA20_KEY_SIZE	32
+
+#define CHACHA20_BLOCK_SIZE     64
+#define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
+
+#define CHACHA20_STATE_LEN	16
+
+/* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
+   Sby RFC8439.  Since the input stream will either zero bytes (initial state)
+   or the PRNG output itself this step does not add any extra entropy.   */
+
+enum chacha20_constants
+{
+  CHACHA20_CONSTANT_EXPA = 0x61707865U,
+  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
+  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
+  CHACHA20_CONSTANT_TE_K = 0x6b206574U
+};
+
+static inline uint32_t
+read_unaligned_32 (const uint8_t *p)
+{
+  uint32_t r;
+  memcpy (&r, p, sizeof (r));
+  return r;
+}
+
+static inline void
+write_unaligned_32 (uint8_t *p, uint32_t v)
+{
+  memcpy (p, &v, sizeof (v));
+}
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
+# define set_state(v)		__builtin_bswap32 ((v))
+#else
+# define read_unaligned_le32(p) read_unaligned_32 ((p))
+# define set_state(v)		(v)
+#endif
+
+static inline void
+chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
+{
+  state[0]  = CHACHA20_CONSTANT_EXPA;
+  state[1]  = CHACHA20_CONSTANT_ND_3;
+  state[2]  = CHACHA20_CONSTANT_2_BY;
+  state[3]  = CHACHA20_CONSTANT_TE_K;
+
+  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
+  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
+  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
+  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
+  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
+  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
+  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
+  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
+
+  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
+  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
+  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
+  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
+}
+
+static inline uint32_t
+rotl32 (unsigned int shift, uint32_t word)
+{
+  return (word << (shift & 31)) | (word >> ((-shift) & 31));
+}
+
+#define QROUND(x0, x1, x2, x3) 			\
+  do {						\
+   x0 = x0 + x1; x3 = rotl32 (16, (x0 ^ x3)); 	\
+   x2 = x2 + x3; x1 = rotl32 (12, (x1 ^ x2)); 	\
+   x0 = x0 + x1; x3 = rotl32 (8,  (x0 ^ x3));	\
+   x2 = x2 + x3; x1 = rotl32 (7,  (x1 ^ x2));	\
+  } while(0)
+
+static inline void
+chacha20_block (uint32_t *state, uint32_t *stream)
+{
+  uint32_t x[CHACHA20_STATE_LEN];
+  memcpy (x, state, sizeof x);
+
+  for (int i = 0; i < 20; i += 2)
+    {
+      QROUND (x[0], x[4], x[8],  x[12]);
+      QROUND (x[1], x[5], x[9],  x[13]);
+      QROUND (x[2], x[6], x[10], x[14]);
+      QROUND (x[3], x[7], x[11], x[15]);
+
+      QROUND (x[0], x[5], x[10], x[15]);
+      QROUND (x[1], x[6], x[11], x[12]);
+      QROUND (x[2], x[7], x[8],  x[13]);
+      QROUND (x[3], x[4], x[9],  x[14]);
+    }
+
+  /* Unroll the loop a bit.  */
+  for (int i = 0; i < CHACHA20_BLOCK_WORDS / 4; i++)
+    {
+      stream[i*4+0] = set_state (x[i*4+0] + state[i*4+0]);
+      stream[i*4+1] = set_state (x[i*4+1] + state[i*4+1]);
+      stream[i*4+2] = set_state (x[i*4+2] + state[i*4+2]);
+      stream[i*4+3] = set_state (x[i*4+3] + state[i*4+3]);
+    }
+
+  state[12]++;
+}
+
+static void
+chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
+		size_t bytes)
+{
+  uint32_t stream[CHACHA20_BLOCK_WORDS];
+
+  while (bytes >= CHACHA20_BLOCK_SIZE)
+    {
+      chacha20_block (state, stream);
+#ifdef CHACHA20_XOR_FINAL
+      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
+	stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
+#endif
+      memcpy (dst, stream, CHACHA20_BLOCK_SIZE);
+      bytes -= CHACHA20_BLOCK_SIZE;
+      dst += CHACHA20_BLOCK_SIZE;
+      src += CHACHA20_BLOCK_SIZE;
+    }
+  if (bytes != 0)
+    {
+      chacha20_block (state, stream);
+#ifdef CHACHA20_XOR_FINAL
+      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
+	stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
+#endif
+      memcpy (dst, stream, bytes);
+    }
+}
diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
index bf7cd438e1..f2b0c83c12 100644
--- a/stdlib/stdlib.h
+++ b/stdlib/stdlib.h
@@ -485,6 +485,7 @@ extern unsigned short int *seed48 (unsigned short int __seed16v[3])
 extern void lcong48 (unsigned short int __param[7]) __THROW __nonnull ((1));
 
 # ifdef __USE_MISC
+#  include <bits/stdint-uintn.h>
 /* Data structure for communication with thread safe versions.  This
    type is to be regarded as opaque.  It's only exported because users
    have to allocate objects of this type.  */
@@ -533,6 +534,19 @@ extern int seed48_r (unsigned short int __seed16v[3],
 extern int lcong48_r (unsigned short int __param[7],
 		      struct drand48_data *__buffer)
      __THROW __nonnull ((1, 2));
+
+/* Return a random integer between zero and 2**31-1 (inclusive).  */
+extern uint32_t arc4random (void)
+     __THROW __wur;
+
+/* Fill the buffer with random data.  */
+extern void arc4random_buf (void *__buf, size_t __size)
+     __THROW __nonnull ((1));
+
+/* Return a random number between zero (inclusive) and the specified
+   limit (exclusive).  */
+extern uint32_t arc4random_uniform (uint32_t __upper_bound)
+     __THROW __wur;
 # endif	/* Use misc.  */
 #endif	/* Use misc or X/Open.  */
 
diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h
index 2104efeb54..f4882a9ffd 100644
--- a/sysdeps/generic/not-cancel.h
+++ b/sysdeps/generic/not-cancel.h
@@ -48,5 +48,7 @@
   (void) __writev (fd, iov, n)
 #define __fcntl64_nocancel(fd, cmd, ...) \
   __fcntl64 (fd, cmd, __VA_ARGS__)
+#define __getrandomn_nocancel(buf, size, flags) \
+  __getrandom (buf, size, flags)
 
 #endif /* NOT_CANCEL_H  */
diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
index 4dc87e9061..7bd565103b 100644
--- a/sysdeps/mach/hurd/i386/libc.abilist
+++ b/sysdeps/mach/hurd/i386/libc.abilist
@@ -2289,6 +2289,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 close_range F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h
index 6ec92ced84..39edfe76b6 100644
--- a/sysdeps/mach/hurd/not-cancel.h
+++ b/sysdeps/mach/hurd/not-cancel.h
@@ -74,6 +74,9 @@ __typeof (__fcntl) __fcntl_nocancel;
 #define __fcntl64_nocancel(...) \
   __fcntl_nocancel (__VA_ARGS__)
 
+#define __getrandomn_nocancel(buf, size, flags) \
+  __getrandom (buf, size, flags)
+
 #if IS_IN (libc)
 hidden_proto (__close_nocancel)
 hidden_proto (__close_nocancel_nostatus)
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 1b63d9e447..f8f38bb205 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2616,3 +2616,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index e7e4cf7d2a..9de1726de0 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2713,6 +2713,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index bc3d228e31..16e2532838 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2377,3 +2377,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index db7039c4ab..ae9e465088 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -496,6 +496,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index d2add4fb49..b669f43194 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -493,6 +493,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 355d72a30c..42daa90248 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2652,3 +2652,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 3df39bb28c..090be20f53 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2601,6 +2601,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index c4da358f80..6b7cf064bb 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2785,6 +2785,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 241bac70ea..3e766f64dd 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2551,6 +2551,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 78bf372b72..c0b99199a8 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -497,6 +497,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 00df5c901f..4d0be7c86d 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2728,6 +2728,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index e8118569c3..b944680ede 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2701,3 +2701,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index c0d2373e64..28f7d19983 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2698,3 +2698,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 2d0fd04f54..3da7cdaca5 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2693,6 +2693,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index e39ccfb312..9fe87f15be 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2691,6 +2691,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 1e900f86e4..c14fca2111 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2699,6 +2699,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 9145ba7931..a363830226 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2602,6 +2602,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index e95d60d926..89b6f98667 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2740,3 +2740,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
index 75b9e0ee1e..be5df35927 100644
--- a/sysdeps/unix/sysv/linux/not-cancel.h
+++ b/sysdeps/unix/sysv/linux/not-cancel.h
@@ -67,6 +67,13 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt)
   INTERNAL_SYSCALL_CALL (writev, fd, iov, iovcnt);
 }
 
+static inline int
+__getrandomn_nocancel (void *buf, size_t buflen, unsigned int flags)
+{
+  return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags);
+}
+
+
 /* Uncancelable fcntl.  */
 __typeof (__fcntl) __fcntl64_nocancel;
 
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index ca934e374b..94c0ff9526 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2123,3 +2123,6 @@ GLIBC_2.35 wprintf F
 GLIBC_2.35 write F
 GLIBC_2.35 writev F
 GLIBC_2.35 wscanf F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 3820b9f235..d6188de00b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2755,6 +2755,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 464dc27fcd..8201230059 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2788,6 +2788,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 2f7e58747f..623505d783 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2510,6 +2510,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 4f3043d913..23b0d83408 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2812,3 +2812,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index 84b6ac815a..a72e8ed9cc 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2379,3 +2379,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 4d5c19c56a..f3faecc2ae 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2579,3 +2579,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 7c5ee8d569..105e5a9231 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2753,6 +2753,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 50de0b46cf..c08c6c8301 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2547,6 +2547,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 66fba013ca..8ec1005644 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2608,6 +2608,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 38703f8aa0..5d776576f9 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2605,6 +2605,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 6df55eb765..f5f07f612e 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2748,6 +2748,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index b90569d881..be687ebe02 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2574,6 +2574,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index e88b0f101f..7f456fbb55 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2525,6 +2525,9 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index e0755272eb..c737201248 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2631,3 +2631,6 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 arc4random F
+GLIBC_2.36 arc4random_buf F
+GLIBC_2.36 arc4random_uniform F
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 2/9] stdlib: Add arc4random tests
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 3/9] benchtests: Add arc4random benchtest Adhemerval Zanella
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

The basic tst-arc4random-chacha20.c checks if the output of ChaCha20
implementation matches the reference test vectors from RFC8439.

The tst-arc4random-fork.c check if subprocesses generate distinct
streams of randomness (if fork handling is done correctly).

The tst-arc4random-stats.c is a statistical test to the randomness of
arc4random, arc4random_buf, and arc4random_uniform.

The tst-arc4random-thread.c check if threads generate distinct streams
of randomness (if function are thread-safe).

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
---
 stdlib/Makefile                  |   7 +
 stdlib/tst-arc4random-chacha20.c | 166 ++++++++++++++++++
 stdlib/tst-arc4random-fork.c     | 174 +++++++++++++++++++
 stdlib/tst-arc4random-stats.c    | 146 ++++++++++++++++
 stdlib/tst-arc4random-thread.c   | 278 +++++++++++++++++++++++++++++++
 5 files changed, 771 insertions(+)
 create mode 100644 stdlib/tst-arc4random-chacha20.c
 create mode 100644 stdlib/tst-arc4random-fork.c
 create mode 100644 stdlib/tst-arc4random-stats.c
 create mode 100644 stdlib/tst-arc4random-thread.c

diff --git a/stdlib/Makefile b/stdlib/Makefile
index 9f9cc1bd7f..c29faf2e43 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -183,6 +183,9 @@ tests := \
   testmb2 \
   testrand \
   testsort \
+  tst-arc4random-fork \
+  tst-arc4random-stats \
+  tst-arc4random-thread \
   tst-at_quick_exit \
   tst-atexit \
   tst-atof1 \
@@ -243,6 +246,7 @@ tests := \
   # tests
 
 tests-internal := \
+  tst-arc4random-chacha20 \
   tst-strtod1i \
   tst-strtod3 \
   tst-strtod4 \
@@ -252,6 +256,7 @@ tests-internal := \
   # tests-internal
 
 tests-static := \
+  tst-arc4random-chacha20 \
   tst-secure-getenv \
   # tests-static
 
@@ -271,6 +276,8 @@ LDLIBS-test-cxa_atexit-race = $(shared-thread-library)
 LDLIBS-test-cxa_atexit-race2 = $(shared-thread-library)
 LDLIBS-test-on_exit-race = $(shared-thread-library)
 LDLIBS-tst-canon-bz26341 = $(shared-thread-library)
+LDLIBS-tst-arc4random-fork = $(shared-thread-library)
+LDLIBS-tst-arc4random-thread = $(shared-thread-library)
 
 LDLIBS-test-dlclose-exit-race = $(shared-thread-library)
 LDFLAGS-test-dlclose-exit-race = $(LDFLAGS-rdynamic)
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
new file mode 100644
index 0000000000..dd0ef6d8ba
--- /dev/null
+++ b/stdlib/tst-arc4random-chacha20.c
@@ -0,0 +1,166 @@
+/* Basic tests for chacha20 cypher used in arc4random.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <sys/cdefs.h>
+
+/* It does not define CHACHA20_XOR_FINAL to check what glibc actual uses. */
+#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
+#include <chacha20.c>
+
+static int
+do_test (void)
+{
+  const uint8_t key[CHACHA20_KEY_SIZE] =
+    {
+      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+    };
+  const uint8_t iv[CHACHA20_IV_SIZE] =
+    {
+      0x0, 0x0, 0x0, 0x0,
+      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+    };
+  const uint8_t expected1[CHACHA20_BUFSIZE] =
+    {
+      0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a,
+      0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d,
+      0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda,
+      0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f,
+      0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1,
+      0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07,
+      0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73,
+      0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69,
+      0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21,
+      0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8,
+      0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac,
+      0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6,
+      0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0,
+      0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3,
+      0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f,
+      0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e,
+      0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71,
+      0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3,
+      0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5,
+      0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34,
+      0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d,
+      0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c,
+      0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31,
+      0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad,
+      0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0,
+      0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72,
+      0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16,
+      0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9,
+      0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a,
+      0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc,
+      0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2,
+      0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb,
+      0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10,
+      0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b,
+      0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2,
+      0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17,
+      0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca,
+      0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84,
+      0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3,
+      0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1,
+      0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7,
+      0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9,
+      0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3,
+      0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e,
+      0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb,
+      0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8,
+      0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb
+    };
+
+  const uint8_t expected2[CHACHA20_BUFSIZE] =
+    {
+      0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36,
+      0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22,
+      0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50,
+      0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10,
+      0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef,
+      0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30,
+      0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c,
+      0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6,
+      0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e,
+      0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90,
+      0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3,
+      0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba,
+      0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4,
+      0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea,
+      0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a,
+      0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80,
+      0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2,
+      0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8,
+      0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce,
+      0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb,
+      0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9,
+      0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc,
+      0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86,
+      0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0,
+      0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4,
+      0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1,
+      0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4,
+      0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f,
+      0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57,
+      0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97,
+      0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef,
+      0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13,
+      0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65,
+      0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30,
+      0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6,
+      0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee,
+      0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c,
+      0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66,
+      0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a,
+      0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c,
+      0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65,
+      0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b,
+      0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88,
+      0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1,
+      0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0,
+      0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36,
+      0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18
+    };
+
+  /* Check with the expected internal arc4random keystream buffer.  Some
+     architecture optimization expectes a buffer with a minimum size of
+     multiple of ChaCha20 blocksize, so they might not be prepared to
+     handle smaller buffers.  */
+
+  uint8_t output[CHACHA20_BUFSIZE];
+
+  uint32_t state[CHACHA20_STATE_LEN];
+  chacha20_init (state, key, iv);
+
+  /* Check with the initial state.  */
+  uint8_t input[CHACHA20_BUFSIZE] = { 0 };
+
+  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
+  TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE);
+
+  /* And on the next round.  */
+  chacha20_crypt (state, output, input, CHACHA20_BUFSIZE);
+  TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/stdlib/tst-arc4random-fork.c b/stdlib/tst-arc4random-fork.c
new file mode 100644
index 0000000000..cd8852c8d3
--- /dev/null
+++ b/stdlib/tst-arc4random-fork.c
@@ -0,0 +1,174 @@
+/* Test that subprocesses generate distinct streams of randomness.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Collect random data from subprocesses and check that all the
+   results are unique.  */
+
+#include <array_length.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/xthread.h>
+#include <support/xunistd.h>
+#include <unistd.h>
+
+/* Perform multiple runs.  The subsequent runs start with an
+   already-initialized random number generator.  (The number 1500 was
+   seen to reproduce failures reliable in case of a race condition in
+   the fork detection code.)  */
+enum { runs = 1500 };
+
+/* One hundred processes in total.  This should be high enough to
+   expose any issues, but low enough not to tax the overall system too
+   much.  */
+enum { subprocesses = 49 };
+
+/* The total number of processes.  */
+enum { processes = subprocesses + 1 };
+
+/* Number of bytes of randomness to generate per process.  Large
+   enough to make false positive duplicates extremely unlikely.  */
+enum { random_size = 16 };
+
+/* Generated bytes of randomness.  */
+struct result
+{
+  unsigned char bytes[random_size];
+};
+
+/* Shared across all processes.  */
+static struct shared_data
+{
+  pthread_barrier_t barrier;
+  struct result results[runs][processes];
+} *shared_data;
+
+/* Invoked to collect data from a subprocess.  */
+static void
+subprocess (int run, int process_index)
+{
+  xpthread_barrier_wait (&shared_data->barrier);
+  arc4random_buf (shared_data->results[run][process_index].bytes, random_size);
+}
+
+/* Used to sort the results.  */
+struct index
+{
+  int run;
+  int process_index;
+};
+
+/* Used to sort an array of struct index values.  */
+static int
+index_compare (const void *left1, const void *right1)
+{
+  const struct index *left = left1;
+  const struct index *right = right1;
+
+  return memcmp (shared_data->results[left->run][left->process_index].bytes,
+                 shared_data->results[right->run][right->process_index].bytes,
+                 random_size);
+}
+
+static int
+do_test (void)
+{
+  shared_data = support_shared_allocate (sizeof (*shared_data));
+  {
+    pthread_barrierattr_t attr;
+    xpthread_barrierattr_init (&attr);
+    xpthread_barrierattr_setpshared (&attr, PTHREAD_PROCESS_SHARED);
+    xpthread_barrier_init (&shared_data->barrier, &attr, processes);
+    xpthread_barrierattr_destroy (&attr);
+  }
+
+  /* Collect random data.  */
+  for (int run = 0; run < runs; ++run)
+    {
+#if 0
+      if (run == runs / 2)
+        {
+          /* In the middle, desynchronize the block cache by consuming
+             an odd number of bytes.  */
+          char buf;
+          arc4random_buf (&buf, 1);
+        }
+#endif
+
+      pid_t pids[subprocesses];
+      for (int process_index = 0; process_index < subprocesses;
+           ++process_index)
+        {
+          pids[process_index] = xfork ();
+          if (pids[process_index] == 0)
+            {
+              subprocess (run, process_index);
+              _exit (0);
+            }
+        }
+
+      /* Trigger all subprocesses.  Also add data from the parent
+         process.  */
+      subprocess (run, subprocesses);
+
+      for (int process_index = 0; process_index < subprocesses;
+           ++process_index)
+        {
+          int status;
+          xwaitpid (pids[process_index], &status, 0);
+          if (status != 0)
+            FAIL_EXIT1 ("subprocess index %d (PID %d) exit status %d\n",
+                        process_index, (int) pids[process_index], status);
+        }
+    }
+
+  /* Check for duplicates.  */
+  struct index indexes[runs * processes];
+  for (int run = 0; run < runs; ++run)
+    for (int process_index = 0; process_index < processes; ++process_index)
+      indexes[run * processes + process_index]
+        = (struct index) { .run = run, .process_index = process_index };
+  qsort (indexes, array_length (indexes), sizeof (indexes[0]), index_compare);
+  for (size_t i = 1; i < array_length (indexes); ++i)
+    {
+      if (index_compare (indexes + i - 1, indexes + i) == 0)
+        {
+          support_record_failure ();
+          unsigned char *bytes
+            = shared_data->results[indexes[i].run]
+                [indexes[i].process_index].bytes;
+          char *quoted = support_quote_blob (bytes, random_size);
+          printf ("error: duplicate randomness data: \"%s\"\n"
+                  "  run %d, subprocess %d\n"
+                  "  run %d, subprocess %d\n",
+                  quoted, indexes[i - 1].run, indexes[i - 1].process_index,
+                  indexes[i].run, indexes[i].process_index);
+          free (quoted);
+        }
+    }
+
+  xpthread_barrier_destroy (&shared_data->barrier);
+  support_shared_free (shared_data);
+  shared_data = NULL;
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/stdlib/tst-arc4random-stats.c b/stdlib/tst-arc4random-stats.c
new file mode 100644
index 0000000000..9747180c99
--- /dev/null
+++ b/stdlib/tst-arc4random-stats.c
@@ -0,0 +1,146 @@
+/* Statistical tests for arc4random-related functions.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+
+enum
+{
+  arc4random_key_size = 32
+};
+
+struct key
+{
+  unsigned char data[arc4random_key_size];
+};
+
+/* With 12,000 keys, the probability that a byte in a predetermined
+   position does not have a predetermined value in all generated keys
+   is about 4e-21.  The probability that this happens with any of the
+   16 * 256 possible byte position/values is 1.6e-17.  This results in
+   an acceptably low false-positive rate.  */
+enum { key_count = 12000 };
+
+static struct key keys[key_count];
+
+/* Used to perform the distribution check.  */
+static int byte_counts[arc4random_key_size][256];
+
+/* Bail out after this many failures.  */
+enum { failure_limit = 100 };
+
+static void
+find_stuck_bytes (bool (*func) (unsigned char *key))
+{
+  memset (&keys, 0xcc, sizeof (keys));
+
+  int failures = 0;
+  for (int key = 0; key < key_count; ++key)
+    {
+      while (true)
+        {
+          if (func (keys[key].data))
+            break;
+          ++failures;
+          if (failures >= failure_limit)
+            {
+              printf ("warning: bailing out after %d failures\n", failures);
+              return;
+            }
+        }
+    }
+  printf ("info: key generation finished with %d failures\n", failures);
+
+  memset (&byte_counts, 0, sizeof (byte_counts));
+  for (int key = 0; key < key_count; ++key)
+    for (int pos = 0; pos < arc4random_key_size; ++pos)
+      ++byte_counts[pos][keys[key].data[pos]];
+
+  for (int pos = 0; pos < arc4random_key_size; ++pos)
+    for (int byte = 0; byte < 256; ++byte)
+      if (byte_counts[pos][byte] == 0)
+        {
+          support_record_failure ();
+          printf ("error: byte %d never appeared at position %d\n", byte, pos);
+        }
+}
+
+/* Test adapter for arc4random.  */
+static bool
+generate_arc4random (unsigned char *key)
+{
+  uint32_t words[arc4random_key_size / 4];
+  _Static_assert (sizeof (words) == arc4random_key_size, "sizeof (words)");
+
+  for (int i = 0; i < array_length (words); ++i)
+    words[i] = arc4random ();
+  memcpy (key, &words, arc4random_key_size);
+  return true;
+}
+
+/* Test adapter for arc4random_buf.  */
+static bool
+generate_arc4random_buf (unsigned char *key)
+{
+  arc4random_buf (key, arc4random_key_size);
+  return true;
+}
+
+/* Test adapter for arc4random_uniform.  */
+static bool
+generate_arc4random_uniform (unsigned char *key)
+{
+  for (int i = 0; i < arc4random_key_size; ++i)
+    key[i] = arc4random_uniform (256);
+  return true;
+}
+
+/* Test adapter for arc4random_uniform with argument 257.  This means
+   that byte 0 happens more often, but we do not perform such a
+   statistcal check, so the test will still pass */
+static bool
+generate_arc4random_uniform_257 (unsigned char *key)
+{
+  for (int i = 0; i < arc4random_key_size; ++i)
+    key[i] = arc4random_uniform (257);
+  return true;
+}
+
+static int
+do_test (void)
+{
+  puts ("info: arc4random implementation test");
+  find_stuck_bytes (generate_arc4random);
+
+  puts ("info: arc4random_buf implementation test");
+  find_stuck_bytes (generate_arc4random_buf);
+
+  puts ("info: arc4random_uniform implementation test");
+  find_stuck_bytes (generate_arc4random_uniform);
+
+  puts ("info: arc4random_uniform implementation test (257 variant)");
+  find_stuck_bytes (generate_arc4random_uniform_257);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/stdlib/tst-arc4random-thread.c b/stdlib/tst-arc4random-thread.c
new file mode 100644
index 0000000000..b122eaa826
--- /dev/null
+++ b/stdlib/tst-arc4random-thread.c
@@ -0,0 +1,278 @@
+/* Test that threads generate distinct streams of randomness.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <array_length.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/namespace.h>
+#include <support/support.h>
+#include <support/xthread.h>
+
+/* Number of arc4random_buf calls per thread.  */
+enum { count_per_thread = 5000 };
+
+/* Number of threads computing randomness.  */
+enum { inner_threads = 5 };
+
+/* Number of threads launching other threads.  Chosen as to not to
+   overload the system.  */
+enum { outer_threads = 7 };
+
+/* Number of launching rounds performed by the outer threads.  */
+enum { outer_rounds = 10 };
+
+/* Maximum number of bytes generated in an arc4random call.  */
+enum { max_size = 32 };
+
+/* Sizes generated by threads.  Must be long enough to be unique with
+   high probability.  */
+static const int sizes[] = { 12, 15, 16, 17, 24, 31, max_size };
+
+/* Data structure to capture randomness results.  */
+struct blob
+{
+  unsigned int size;
+  int thread_id;
+  unsigned int index;
+  unsigned char bytes[max_size];
+};
+
+#define DYNARRAY_STRUCT dynarray_blob
+#define DYNARRAY_ELEMENT struct blob
+#define DYNARRAY_PREFIX dynarray_blob_
+#include <malloc/dynarray-skeleton.c>
+
+/* Sort blob elements by length first, then by comparing the data
+   member.  */
+static int
+compare_blob (const void *left1, const void *right1)
+{
+  const struct blob *left = left1;
+  const struct blob *right = right1;
+
+  if (left->size != right->size)
+    /* No overflow due to limited range.  */
+    return left->size - right->size;
+  return memcmp (left->bytes, right->bytes, left->size);
+}
+
+/* Used to store the global result.  */
+static pthread_mutex_t global_result_lock = PTHREAD_MUTEX_INITIALIZER;
+static struct dynarray_blob global_result;
+
+/* Copy data to the global result, with locking.  */
+static void
+copy_result_to_global (struct dynarray_blob *result)
+{
+  xpthread_mutex_lock (&global_result_lock);
+  size_t old_size = dynarray_blob_size (&global_result);
+  TEST_VERIFY_EXIT
+    (dynarray_blob_resize (&global_result,
+                           old_size + dynarray_blob_size (result)));
+  memcpy (dynarray_blob_begin (&global_result) + old_size,
+          dynarray_blob_begin (result),
+          dynarray_blob_size (result) * sizeof (struct blob));
+  xpthread_mutex_unlock (&global_result_lock);
+}
+
+/* Used to assign unique thread IDs.  Accessed atomically.  */
+static int next_thread_id;
+
+static void *
+inner_thread (void *unused)
+{
+  /* Use local result to avoid global lock contention while generating
+     randomness.  */
+  struct dynarray_blob result;
+  dynarray_blob_init (&result);
+
+  int thread_id = __atomic_fetch_add (&next_thread_id, 1, __ATOMIC_RELAXED);
+
+  /* Determine the sizes to be used by this thread.  */
+  int size_slot = thread_id % (array_length (sizes) + 1);
+  bool switch_sizes = size_slot == array_length (sizes);
+  if (switch_sizes)
+    size_slot = 0;
+
+  /* Compute the random blobs.  */
+  for (int i = 0; i < count_per_thread; ++i)
+    {
+      struct blob *place = dynarray_blob_emplace (&result);
+      TEST_VERIFY_EXIT (place != NULL);
+      place->size = sizes[size_slot];
+      place->thread_id = thread_id;
+      place->index = i;
+      arc4random_buf (place->bytes, place->size);
+
+      if (switch_sizes)
+        size_slot = (size_slot + 1) % array_length (sizes);
+    }
+
+  /* Store the blobs in the global result structure.  */
+  copy_result_to_global (&result);
+
+  dynarray_blob_free (&result);
+
+  return NULL;
+}
+
+/* Launch the inner threads and wait for their termination.  */
+static void *
+outer_thread (void *unused)
+{
+  for (int round = 0; round < outer_rounds; ++round)
+    {
+      pthread_t threads[inner_threads];
+
+      for (int i = 0; i < inner_threads; ++i)
+        threads[i] = xpthread_create (NULL, inner_thread, NULL);
+
+      for (int i = 0; i < inner_threads; ++i)
+        xpthread_join (threads[i]);
+    }
+
+  return NULL;
+}
+
+static bool termination_requested;
+
+/* Call arc4random_buf to fill one blob with 16 bytes.  */
+static void *
+get_one_blob_thread (void *closure)
+{
+  struct blob *result = closure;
+  result->size = 16;
+  arc4random_buf (result->bytes, result->size);
+  return NULL;
+}
+
+/* Invoked from fork_thread to actually obtain randomness data.  */
+static void
+fork_thread_subprocess (void *closure)
+{
+  struct blob *shared_result = closure;
+
+  pthread_t thr1 = xpthread_create
+    (NULL, get_one_blob_thread, shared_result + 1);
+  pthread_t thr2 = xpthread_create
+    (NULL, get_one_blob_thread, shared_result + 2);
+  get_one_blob_thread (shared_result);
+  xpthread_join (thr1);
+  xpthread_join (thr2);
+}
+
+/* Continuously fork subprocesses to obtain a little bit of
+   randomness.  */
+static void *
+fork_thread (void *unused)
+{
+  struct dynarray_blob result;
+  dynarray_blob_init (&result);
+
+  /* Three blobs from each subprocess.  */
+  struct blob *shared_result
+    = support_shared_allocate (3 * sizeof (*shared_result));
+
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    {
+      /* Obtain the results from a subprocess.  */
+      support_isolate_in_subprocess (fork_thread_subprocess, shared_result);
+
+      for (int i = 0; i < 3; ++i)
+        {
+          struct blob *place = dynarray_blob_emplace (&result);
+          TEST_VERIFY_EXIT (place != NULL);
+          place->size = shared_result[i].size;
+          place->thread_id = -1;
+          place->index = i;
+          memcpy (place->bytes, shared_result[i].bytes, place->size);
+        }
+    }
+
+  support_shared_free (shared_result);
+
+  copy_result_to_global (&result);
+  dynarray_blob_free (&result);
+
+  return NULL;
+}
+
+/* Launch the outer threads and wait for their termination.  */
+static void
+run_outer_threads (void)
+{
+  /* Special thread that continuously calls fork.  */
+  pthread_t fork_thread_id = xpthread_create (NULL, fork_thread, NULL);
+
+  pthread_t threads[outer_threads];
+  for (int i = 0; i < outer_threads; ++i)
+    threads[i] = xpthread_create (NULL, outer_thread, NULL);
+
+  for (int i = 0; i < outer_threads; ++i)
+    xpthread_join (threads[i]);
+
+  __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED);
+  xpthread_join (fork_thread_id);
+}
+
+static int
+do_test (void)
+{
+  dynarray_blob_init (&global_result);
+  int expected_blobs
+    = count_per_thread * inner_threads * outer_threads * outer_rounds;
+  printf ("info: minimum of %d blob results expected\n", expected_blobs);
+
+  run_outer_threads ();
+
+  /* The forking thread delivers a non-deterministic number of
+     results, which is why expected_blobs is only a minimun number of
+     results.  */
+  printf ("info: %zu blob results observed\n",
+          dynarray_blob_size (&global_result));
+  TEST_VERIFY (dynarray_blob_size (&global_result) >= expected_blobs);
+
+  /* Verify that there are no duplicates.  */
+  qsort (dynarray_blob_begin (&global_result),
+         dynarray_blob_size (&global_result),
+         sizeof (struct blob), compare_blob);
+  struct blob *end = dynarray_blob_end (&global_result);
+  for (struct blob *p = dynarray_blob_begin (&global_result) + 1;
+       p < end; ++p)
+    {
+      if (compare_blob (p - 1, p) == 0)
+        {
+          support_record_failure ();
+          char *quoted = support_quote_blob (p->bytes, p->size);
+          printf ("error: duplicate blob: \"%s\" (%d bytes)\n",
+                  quoted, (int) p->size);
+          printf ("  first source: thread %d, index %u\n",
+                  p[-1].thread_id, p[-1].index);
+          printf ("  second source: thread %d, index %u\n",
+                  p[0].thread_id, p[0].index);
+          free (quoted);
+        }
+    }
+
+  dynarray_blob_free (&global_result);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 3/9] benchtests: Add arc4random benchtest
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 2/9] stdlib: Add arc4random tests Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 4/9] aarch64: Add optimized chacha20 Adhemerval Zanella
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

It shows both throughput (total bytes obtained in the test duration)
and latecy for both arc4random and arc4random_buf with different
sizes.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
---
 benchtests/Makefile           |   6 +-
 benchtests/bench-arc4random.c | 224 ++++++++++++++++++++++++++++++++++
 2 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 benchtests/bench-arc4random.c

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 8dfca592fd..50b96dd71f 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -111,8 +111,12 @@ bench-string := \
   ffsll \
 # bench-string
 
+bench-stdlib := \
+  arc4random \
+# bench-stdlib
+
 ifeq (${BENCHSET},)
-bench := $(bench-math) $(bench-pthread) $(bench-string)
+bench := $(bench-math) $(bench-pthread) $(bench-string) $(bench-stdlib)
 else
 bench := $(foreach B,$(filter bench-%,${BENCHSET}), ${${B}})
 endif
diff --git a/benchtests/bench-arc4random.c b/benchtests/bench-arc4random.c
new file mode 100644
index 0000000000..626f2ba48c
--- /dev/null
+++ b/benchtests/bench-arc4random.c
@@ -0,0 +1,224 @@
+/* arc4random benchmarks.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "bench-timing.h"
+#include "json-lib.h"
+#include <array_length.h>
+#include <intprops.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <support/support.h>
+#include <support/timespec.h>
+#include <support/xthread.h>
+
+/* Prevent compiler to optimize away call.  */
+#define DO_NOT_OPTIMIZE(__value)		\
+  ({						\
+    __typeof (__value) __v = (__value);		\
+    asm volatile("" : : "r,m" (__v) : "memory");\
+  })
+
+static volatile sig_atomic_t timer_finished;
+
+static void timer_callback (int unused)
+{
+  timer_finished = 1;
+}
+
+static timer_t timer;
+
+/* Run for approximately DURATION seconds, and it does not matter who
+   receive the signal (so not need to mask it on main thread).  */
+static void
+timer_start (void)
+{
+  timer_finished = 0;
+  timer = support_create_timer (DURATION, 0, false, timer_callback);
+}
+static void
+timer_stop (void)
+{
+  support_delete_timer (timer);
+}
+
+static const uint32_t sizes[] = { 0, 16, 32, 48, 64, 80, 96, 112, 128 };
+
+static double
+bench_throughput (void)
+{
+  uint64_t n = 0;
+
+  struct timespec start, end;
+  clock_gettime (CLOCK_MONOTONIC, &start);
+  while (1)
+    {
+      DO_NOT_OPTIMIZE (arc4random ());
+      n++;
+
+      if (timer_finished == 1)
+	break;
+    }
+  clock_gettime (CLOCK_MONOTONIC, &end);
+  struct timespec diff = timespec_sub (end, start);
+
+  double total = (double) n * sizeof (uint32_t);
+  double duration = (double) diff.tv_sec
+    + (double) diff.tv_nsec / TIMESPEC_HZ;
+
+  return total / duration;
+}
+
+static double
+bench_latency (void)
+{
+  timing_t start, stop, cur;
+  const size_t iters = 1024;
+
+  TIMING_NOW (start);
+  for (size_t i = 0; i < iters; i++)
+    DO_NOT_OPTIMIZE (arc4random ());
+  TIMING_NOW (stop);
+
+  TIMING_DIFF (cur, start, stop);
+
+  return (double) (cur) / (double) iters;
+}
+
+static double
+bench_buf_throughput (size_t len)
+{
+  uint8_t buf[len];
+  uint64_t n = 0;
+
+  struct timespec start, end;
+  clock_gettime (CLOCK_MONOTONIC, &start);
+  while (1)
+    {
+      arc4random_buf (buf, len);
+      n++;
+
+      if (timer_finished == 1)
+	break;
+    }
+  clock_gettime (CLOCK_MONOTONIC, &end);
+  struct timespec diff = timespec_sub (end, start);
+
+  double total = (double) n * len;
+  double duration = (double) diff.tv_sec
+    + (double) diff.tv_nsec / TIMESPEC_HZ;
+
+  return total / duration;
+}
+
+static double
+bench_buf_latency (size_t len)
+{
+  timing_t start, stop, cur;
+  const size_t iters = 1024;
+
+  uint8_t buf[len];
+
+  TIMING_NOW (start);
+  for (size_t i = 0; i < iters; i++)
+    arc4random_buf (buf, len);
+  TIMING_NOW (stop);
+
+  TIMING_DIFF (cur, start, stop);
+
+  return (double) (cur) / (double) iters;
+}
+
+static void
+bench_singlethread (json_ctx_t *json_ctx)
+{
+  json_element_object_begin (json_ctx);
+
+  json_array_begin (json_ctx, "throughput");
+  for (int i = 0; i < array_length (sizes); i++)
+    {
+      timer_start ();
+      double r = sizes[i] == 0
+	? bench_throughput () : bench_buf_throughput (sizes[i]);
+      timer_stop ();
+
+      json_element_double (json_ctx, r);
+    }
+  json_array_end (json_ctx);
+
+  json_array_begin (json_ctx, "latency");
+  for (int i = 0; i < array_length (sizes); i++)
+    {
+      timer_start ();
+      double r = sizes[i] == 0
+	? bench_latency () : bench_buf_latency (sizes[i]);
+      timer_stop ();
+
+      json_element_double (json_ctx, r);
+    }
+  json_array_end (json_ctx);
+
+  json_element_object_end (json_ctx);
+}
+
+static void
+run_bench (json_ctx_t *json_ctx, const char *name,
+	   char *const*fnames, size_t fnameslen,
+	   void (*bench)(json_ctx_t *ctx))
+{
+  json_attr_object_begin (json_ctx, name);
+  json_array_begin (json_ctx, "functions");
+  for (int i = 0; i < fnameslen; i++)
+    json_element_string (json_ctx, fnames[i]);
+  json_array_end (json_ctx);
+
+  json_array_begin (json_ctx, "results");
+  bench (json_ctx);
+  json_array_end (json_ctx);
+  json_attr_object_end (json_ctx);
+}
+
+static int
+do_test (void)
+{
+  char *fnames[array_length (sizes)];
+  for (int i = 0; i < array_length (sizes); i++)
+    if (sizes[i] == 0)
+      fnames[i] = xasprintf ("arc4random");
+    else
+      fnames[i] = xasprintf ("arc4random_buf(%u)", sizes[i]);
+
+  json_ctx_t json_ctx;
+  json_init (&json_ctx, 0, stdout);
+
+  json_document_begin (&json_ctx);
+  json_attr_string (&json_ctx, "timing_type", TIMING_TYPE);
+
+  run_bench (&json_ctx, "single-thread", fnames, array_length (fnames),
+	     bench_singlethread);
+
+  json_document_end (&json_ctx);
+
+  for (int i = 0; i < array_length (sizes); i++)
+    free (fnames[i]);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 4/9] aarch64: Add optimized chacha20
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
                   ` (2 preceding siblings ...)
  2022-04-19 21:28 ` [PATCH v3 3/9] benchtests: Add arc4random benchtest Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 5/9] x86: Add SSE2 " Adhemerval Zanella
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S.  It is used as default and only
little-endian is supported (BE uses generic fallback code).

As for generic implementation, the last step that XOR with the
input is omited.

On a Neoverse-N1 it shows the following improvements (using
formatted bench-arc4random data):

GENERIC
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               149.74
arc4random_buf(16) [single-thread]       259.47
arc4random_buf(32) [single-thread]       325.34
arc4random_buf(48) [single-thread]       347.49
arc4random_buf(64) [single-thread]       361.60
arc4random_buf(80) [single-thread]       371.36
arc4random_buf(96) [single-thread]       383.19
arc4random_buf(112) [single-thread]      386.03
arc4random_buf(128) [single-thread]      388.69
--------------------------------------------------

OPTIMIZED
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               154.98
arc4random_buf(16) [single-thread]       342.63
arc4random_buf(32) [single-thread]       485.91
arc4random_buf(48) [single-thread]       539.95
arc4random_buf(64) [single-thread]       593.38
arc4random_buf(80) [single-thread]       629.45
arc4random_buf(96) [single-thread]       655.78
arc4random_buf(112) [single-thread]      670.54
arc4random_buf(128) [single-thread]      681.65
--------------------------------------------------

Checked on aarch64-linux-gnu.
---
 LICENSES                        |  20 ++
 stdlib/chacha20.c               |   8 +-
 sysdeps/aarch64/Makefile        |   4 +
 sysdeps/aarch64/chacha20-neon.S | 323 ++++++++++++++++++++++++++++++++
 sysdeps/aarch64/chacha20_arch.h |  40 ++++
 sysdeps/generic/chacha20_arch.h |  24 +++
 6 files changed, 417 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/aarch64/chacha20-neon.S
 create mode 100644 sysdeps/aarch64/chacha20_arch.h
 create mode 100644 sysdeps/generic/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index 530893b1dc..7288d281dc 100644
--- a/LICENSES
+++ b/LICENSES
@@ -389,3 +389,23 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
+\f
+sysdeps/aarch64/chacha20.S imports code from libgcrypt, with the
+following notices:
+
+Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
+
+This file is part of Libgcrypt.
+
+Libgcrypt is free software; you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License as
+published by the Free Software Foundation; either version 2.1 of
+the License, or (at your option) any later version.
+
+Libgcrypt is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this program; if not, see <http://www.gnu.org/licenses/>.
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
index af4ffa9860..fea4994169 100644
--- a/stdlib/chacha20.c
+++ b/stdlib/chacha20.c
@@ -134,8 +134,9 @@ chacha20_block (uint32_t *state, uint32_t *stream)
 }
 
 static void
-chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
-		size_t bytes)
+__attribute_maybe_unused__
+chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src,
+			size_t bytes)
 {
   uint32_t stream[CHACHA20_BLOCK_WORDS];
 
@@ -161,3 +162,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
       memcpy (dst, stream, bytes);
     }
 }
+
+/* Get the architecture optimized version.  */
+#include <chacha20_arch.h>
diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile
index 7183895d04..812cdc2c00 100644
--- a/sysdeps/aarch64/Makefile
+++ b/sysdeps/aarch64/Makefile
@@ -50,6 +50,10 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += tlsdesc.sym
 endif
 
+ifeq ($(subdir),stdlib)
+sysdep_routines += chacha20-neon
+endif
+
 ifeq ($(subdir),gmon)
 CFLAGS-mcount.c += -mgeneral-regs-only
 endif
diff --git a/sysdeps/aarch64/chacha20-neon.S b/sysdeps/aarch64/chacha20-neon.S
new file mode 100644
index 0000000000..f5652d5062
--- /dev/null
+++ b/sysdeps/aarch64/chacha20-neon.S
@@ -0,0 +1,323 @@
+/* Optimized AArch64 implementation of ChaCha20 cipher.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+/* Only LE is supported.  */
+#ifdef __AARCH64EL__
+
+/* Based on D. J. Bernstein reference implementation at
+   http://cr.yp.to/chacha.html:
+
+   chacha-regs.c version 20080118
+   D. J. Bernstein
+   Public domain.  */
+
+#define GET_DATA_POINTER(reg, name) \
+        adrp    reg, name ; \
+        add     reg, reg, :lo12:name
+
+/* 'ret' instruction replacement for straight-line speculation mitigation */
+#define ret_spec_stop \
+        ret; dsb sy; isb;
+
+.cpu generic+simd
+
+.text
+
+/* register macros */
+#define INPUT     x0
+#define DST       x1
+#define SRC       x2
+#define NBLKS     x3
+#define ROUND     x4
+#define INPUT_CTR x5
+#define INPUT_POS x6
+#define CTR       x7
+
+/* vector registers */
+#define X0 v16
+#define X4 v17
+#define X8 v18
+#define X12 v19
+
+#define X1 v20
+#define X5 v21
+
+#define X9 v22
+#define X13 v23
+#define X2 v24
+#define X6 v25
+
+#define X3 v26
+#define X7 v27
+#define X11 v28
+#define X15 v29
+
+#define X10 v30
+#define X14 v31
+
+#define VCTR    v0
+#define VTMP0   v1
+#define VTMP1   v2
+#define VTMP2   v3
+#define VTMP3   v4
+#define X12_TMP v5
+#define X13_TMP v6
+#define ROT8    v7
+
+/**********************************************************************
+  helper macros
+ **********************************************************************/
+
+#define _(...) __VA_ARGS__
+
+#define vpunpckldq(s1, s2, dst) \
+	zip1 dst.4s, s2.4s, s1.4s;
+
+#define vpunpckhdq(s1, s2, dst) \
+	zip2 dst.4s, s2.4s, s1.4s;
+
+#define vpunpcklqdq(s1, s2, dst) \
+	zip1 dst.2d, s2.2d, s1.2d;
+
+#define vpunpckhqdq(s1, s2, dst) \
+	zip2 dst.2d, s2.2d, s1.2d;
+
+/* 4x4 32-bit integer matrix transpose */
+#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \
+	vpunpckhdq(x1, x0, t2); \
+	vpunpckldq(x1, x0, x0); \
+	\
+	vpunpckldq(x3, x2, t1); \
+	vpunpckhdq(x3, x2, x2); \
+	\
+	vpunpckhqdq(t1, x0, x1); \
+	vpunpcklqdq(t1, x0, x0); \
+	\
+	vpunpckhqdq(x2, t2, x3); \
+	vpunpcklqdq(x2, t2, x2);
+
+#define clear(x) \
+	movi x.16b, #0;
+
+/**********************************************************************
+  4-way chacha20
+ **********************************************************************/
+
+#define XOR(d,s1,s2) \
+	eor d.16b, s2.16b, s1.16b;
+
+#define PLUS(ds,s) \
+	add ds.4s, ds.4s, s.4s;
+
+#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \
+	shl dst1.4s, src1.4s, #(c);		\
+	shl dst2.4s, src2.4s, #(c);		\
+	shl dst3.4s, src3.4s, #(c);		\
+	shl dst4.4s, src4.4s, #(c);		\
+	sri dst1.4s, src1.4s, #(32 - (c));	\
+	sri dst2.4s, src2.4s, #(32 - (c));	\
+	sri dst3.4s, src3.4s, #(32 - (c));	\
+	sri dst4.4s, src4.4s, #(32 - (c));
+
+#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
+	tbl dst1.16b, {src1.16b}, ROT8.16b;     \
+	tbl dst2.16b, {src2.16b}, ROT8.16b;	\
+	tbl dst3.16b, {src3.16b}, ROT8.16b;	\
+	tbl dst4.16b, {src4.16b}, ROT8.16b;
+
+#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \
+	rev32 dst1.8h, src1.8h;			\
+	rev32 dst2.8h, src2.8h;			\
+	rev32 dst3.8h, src3.8h;			\
+	rev32 dst4.8h, src4.8h;
+
+#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \
+	PLUS(a1,b1); PLUS(a2,b2);						\
+	PLUS(a3,b3); PLUS(a4,b4);						\
+	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
+	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
+		ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4);		\
+	PLUS(c1,d1); PLUS(c2,d2);						\
+	PLUS(c3,d3); PLUS(c4,d4);						\
+	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
+	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
+		ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4)		\
+	PLUS(a1,b1); PLUS(a2,b2);						\
+	PLUS(a3,b3); PLUS(a4,b4);						\
+	    XOR(tmp1,d1,a1); XOR(tmp2,d2,a2);					\
+	    XOR(tmp3,d3,a3); XOR(tmp4,d4,a4);					\
+		ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4)		\
+	PLUS(c1,d1); PLUS(c2,d2);						\
+	PLUS(c3,d3); PLUS(c4,d4);						\
+	    XOR(tmp1,b1,c1); XOR(tmp2,b2,c2);					\
+	    XOR(tmp3,b3,c3); XOR(tmp4,b4,c4);					\
+		ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4)		\
+
+.align 4
+L(__chacha20_blocks4_data_inc_counter):
+	.long 0,1,2,3
+
+.align 4
+L(__chacha20_blocks4_data_rot8):
+	.byte 3,0,1,2
+	.byte 7,4,5,6
+	.byte 11,8,9,10
+	.byte 15,12,13,14
+
+.hidden __chacha20_neon_blocks4
+ENTRY (__chacha20_neon_blocks4)
+	/* input:
+	 *	x0: input
+	 *	x1: dst
+	 *	x2: src
+	 *	x3: nblks (multiple of 4)
+	 */
+
+	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8))
+	add INPUT_CTR, INPUT, #(12*4);
+	ld1 {ROT8.16b}, [CTR];
+	GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter))
+	mov INPUT_POS, INPUT;
+	ld1 {VCTR.16b}, [CTR];
+
+L(loop4):
+	/* Construct counter vectors X12 and X13 */
+
+	ld1 {X15.16b}, [INPUT_CTR];
+	mov ROUND, #20;
+	ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS];
+
+	dup X12.4s, X15.s[0];
+	dup X13.4s, X15.s[1];
+	ldr CTR, [INPUT_CTR];
+	add X12.4s, X12.4s, VCTR.4s;
+	dup X0.4s, VTMP1.s[0];
+	dup X1.4s, VTMP1.s[1];
+	dup X2.4s, VTMP1.s[2];
+	dup X3.4s, VTMP1.s[3];
+	dup X14.4s, X15.s[2];
+	cmhi VTMP0.4s, VCTR.4s, X12.4s;
+	dup X15.4s, X15.s[3];
+	add CTR, CTR, #4; /* Update counter */
+	dup X4.4s, VTMP2.s[0];
+	dup X5.4s, VTMP2.s[1];
+	dup X6.4s, VTMP2.s[2];
+	dup X7.4s, VTMP2.s[3];
+	sub X13.4s, X13.4s, VTMP0.4s;
+	dup X8.4s, VTMP3.s[0];
+	dup X9.4s, VTMP3.s[1];
+	dup X10.4s, VTMP3.s[2];
+	dup X11.4s, VTMP3.s[3];
+	mov X12_TMP.16b, X12.16b;
+	mov X13_TMP.16b, X13.16b;
+	str CTR, [INPUT_CTR];
+
+L(round2):
+	subs ROUND, ROUND, #2
+	QUARTERROUND4(X0, X4,  X8, X12,   X1, X5,  X9, X13,
+		      X2, X6, X10, X14,   X3, X7, X11, X15,
+		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
+	QUARTERROUND4(X0, X5, X10, X15,   X1, X6, X11, X12,
+		      X2, X7,  X8, X13,   X3, X4,  X9, X14,
+		      tmp:=,VTMP0,VTMP1,VTMP2,VTMP3)
+	b.ne L(round2);
+
+	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32;
+
+	PLUS(X12, X12_TMP);        /* INPUT + 12 * 4 + counter */
+	PLUS(X13, X13_TMP);        /* INPUT + 13 * 4 + counter */
+
+	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */
+	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */
+	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */
+	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */
+	PLUS(X0, VTMP2);
+	PLUS(X1, VTMP3);
+	PLUS(X2, X12_TMP);
+	PLUS(X3, X13_TMP);
+
+	dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */
+	dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */
+	dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */
+	dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */
+	ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS];
+	mov INPUT_POS, INPUT;
+	PLUS(X4, VTMP2);
+	PLUS(X5, VTMP3);
+	PLUS(X6, X12_TMP);
+	PLUS(X7, X13_TMP);
+
+	dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */
+	dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */
+	dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */
+	dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */
+	dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */
+	dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */
+	PLUS(X8, VTMP2);
+	PLUS(X9, VTMP3);
+	PLUS(X10, X12_TMP);
+	PLUS(X11, X13_TMP);
+	PLUS(X14, VTMP0);
+	PLUS(X15, VTMP1);
+
+	transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2);
+	transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2);
+	transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2);
+	transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2);
+
+	subs NBLKS, NBLKS, #4;
+
+	st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64
+	st1 {X1.16b,X5.16b}, [DST], #32;
+	st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64
+	st1 {X10.16b,X14.16b}, [DST], #32;
+	st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64;
+
+	b.ne L(loop4);
+
+	/* clear the used vector registers and stack */
+	clear(VTMP0);
+	clear(VTMP1);
+	clear(VTMP2);
+	clear(VTMP3);
+	clear(X12_TMP);
+	clear(X13_TMP);
+	clear(X0);
+	clear(X1);
+	clear(X2);
+	clear(X3);
+	clear(X4);
+	clear(X5);
+	clear(X6);
+	clear(X7);
+	clear(X8);
+	clear(X9);
+	clear(X10);
+	clear(X11);
+	clear(X12);
+	clear(X13);
+	clear(X14);
+	clear(X15);
+
+	eor x0, x0, x0
+	ret_spec_stop
+END (__chacha20_neon_blocks4)
+
+#endif
diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h
new file mode 100644
index 0000000000..9febee7bb6
--- /dev/null
+++ b/sysdeps/aarch64/chacha20_arch.h
@@ -0,0 +1,40 @@
+/* Chacha20 implementation, used on arc4random.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+#include <stdbool.h>
+
+unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst,
+				      const uint8_t *src, size_t nblks)
+     attribute_hidden;
+
+static void
+chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
+		size_t bytes)
+{
+  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
+		  "CHACHA20_BUFSIZE not multiple of 4");
+  _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4,
+		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
+#ifdef __AARCH64EL__
+  __chacha20_neon_blocks4 (state, dst, src,
+			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+#else
+  chacha20_crypt_generic (state, dst, src, bytes);
+#endif
+}
diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h
new file mode 100644
index 0000000000..efad41d034
--- /dev/null
+++ b/sysdeps/generic/chacha20_arch.h
@@ -0,0 +1,24 @@
+/* Chacha20 implementation, generic interface for encrypt.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+static inline void
+chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
+		size_t bytes)
+{
+  chacha20_crypt_generic (state, dst, src, bytes);
+}
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 5/9] x86: Add SSE2 optimized chacha20
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
                   ` (3 preceding siblings ...)
  2022-04-19 21:28 ` [PATCH v3 4/9] aarch64: Add optimized chacha20 Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 6/9] x86: Add AVX2 " Adhemerval Zanella
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               384.43
arc4random_buf(16) [single-thread]       465.00
arc4random_buf(32) [single-thread]       528.32
arc4random_buf(48) [single-thread]       546.26
arc4random_buf(64) [single-thread]       560.57
arc4random_buf(80) [single-thread]       562.37
arc4random_buf(96) [single-thread]       572.05
arc4random_buf(112) [single-thread]      573.12
arc4random_buf(128) [single-thread]      578.12
--------------------------------------------------

SSE2:
--------------------------------------------------
arc4random [single-thread]               637.06
arc4random_buf(16) [single-thread]       856.62
arc4random_buf(32) [single-thread]       1129.41
arc4random_buf(48) [single-thread]       1260.61
arc4random_buf(64) [single-thread]       1330.56
arc4random_buf(80) [single-thread]       1353.84
arc4random_buf(96) [single-thread]       1376.53
arc4random_buf(112) [single-thread]      1405.74
arc4random_buf(128) [single-thread]      1422.59
--------------------------------------------------

Checked on x86_64-linux-gnu.
---
 LICENSES                       |   4 +-
 sysdeps/x86_64/Makefile        |   6 +
 sysdeps/x86_64/chacha20-sse2.S | 311 +++++++++++++++++++++++++++++++++
 sysdeps/x86_64/chacha20_arch.h |  38 ++++
 4 files changed, 357 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/x86_64/chacha20-sse2.S
 create mode 100644 sysdeps/x86_64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index 7288d281dc..415991e208 100644
--- a/LICENSES
+++ b/LICENSES
@@ -390,8 +390,8 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
 \f
-sysdeps/aarch64/chacha20.S imports code from libgcrypt, with the
-following notices:
+sysdeps/aarch64/chacha20.S and sysdeps/x86_64/chacha20-sse2.S
+import code from libgcrypt, with the following notices:
 
 Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
 
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 79365aff2a..c8fbc30857 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -5,6 +5,12 @@ ifeq ($(subdir),csu)
 gen-as-const-headers += link-defines.sym
 endif
 
+ifeq ($(subdir),stdlib)
+sysdep_routines += \
+  chacha20-sse2 \
+  # sysdep_routines
+endif
+
 ifeq ($(subdir),gmon)
 sysdep_routines += _mcount
 # We cannot compile _mcount.S with -pg because that would create
diff --git a/sysdeps/x86_64/chacha20-sse2.S b/sysdeps/x86_64/chacha20-sse2.S
new file mode 100644
index 0000000000..3a4cb7d2ea
--- /dev/null
+++ b/sysdeps/x86_64/chacha20-sse2.S
@@ -0,0 +1,311 @@
+/* Optimized SSE2 implementation of ChaCha20 cipher.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Based on D. J. Bernstein reference implementation at
+   http://cr.yp.to/chacha.html:
+
+   chacha-regs.c version 20080118
+   D. J. Bernstein
+   Public domain.  */
+
+#include <sysdep.h>
+
+#ifdef PIC
+#  define rRIP (%rip)
+#else
+#  define rRIP
+#endif
+
+/* 'ret' instruction replacement for straight-line speculation mitigation */
+#define ret_spec_stop \
+        ret; int3;
+
+/* register macros */
+#define INPUT %rdi
+#define DST   %rsi
+#define SRC   %rdx
+#define NBLKS %rcx
+#define ROUND %eax
+
+/* stack structure */
+#define STACK_VEC_X12 (16)
+#define STACK_VEC_X13 (16 + STACK_VEC_X12)
+#define STACK_TMP     (16 + STACK_VEC_X13)
+#define STACK_TMP1    (16 + STACK_TMP)
+#define STACK_TMP2    (16 + STACK_TMP1)
+
+#define STACK_MAX     (16 + STACK_TMP2)
+
+/* vector registers */
+#define X0 %xmm0
+#define X1 %xmm1
+#define X2 %xmm2
+#define X3 %xmm3
+#define X4 %xmm4
+#define X5 %xmm5
+#define X6 %xmm6
+#define X7 %xmm7
+#define X8 %xmm8
+#define X9 %xmm9
+#define X10 %xmm10
+#define X11 %xmm11
+#define X12 %xmm12
+#define X13 %xmm13
+#define X14 %xmm14
+#define X15 %xmm15
+
+/**********************************************************************
+  helper macros
+ **********************************************************************/
+
+/* 4x4 32-bit integer matrix transpose */
+#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \
+	movdqa    x0, t2; \
+	punpckhdq x1, t2; \
+	punpckldq x1, x0; \
+	\
+	movdqa    x2, t1; \
+	punpckldq x3, t1; \
+	punpckhdq x3, x2; \
+	\
+	movdqa     x0, x1; \
+	punpckhqdq t1, x1; \
+	punpcklqdq t1, x0; \
+	\
+	movdqa     t2, x3; \
+	punpckhqdq x2, x3; \
+	punpcklqdq x2, t2; \
+	movdqa     t2, x2;
+
+/* fill xmm register with 32-bit value from memory */
+#define PBROADCASTD(mem32, xreg) \
+	movd mem32, xreg; \
+	pshufd $0, xreg, xreg;
+
+#define CLEAR(x) pxor x,x;
+
+/**********************************************************************
+  4-way chacha20
+ **********************************************************************/
+
+#define ROTATE2(v1,v2,c,tmp1,tmp2)	\
+	movdqa v1, tmp1; 		\
+	movdqa v2, tmp2; 		\
+	psrld $(32 - (c)), v1;		\
+	pslld $(c), tmp1;		\
+	paddb tmp1, v1;			\
+	psrld $(32 - (c)), v2;		\
+	pslld $(c), tmp2;		\
+	paddb tmp2, v2;
+
+#define XOR(ds,s) \
+	pxor s, ds;
+
+#define PLUS(ds,s) \
+	paddd s, ds;
+
+#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2)	\
+	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
+	    ROTATE2(d1, d2, 16, tmp1, tmp2);			\
+	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
+	    ROTATE2(b1, b2, 12, tmp1, tmp2);			\
+	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
+	    ROTATE2(d1, d2, 8, tmp1, tmp2);			\
+	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
+	    ROTATE2(b1, b2,  7, tmp1, tmp2);
+
+	.section .text.sse2,"ax",@progbits
+
+chacha20_data:
+	.align 16
+L(counter1):
+	.long 1,0,0,0
+L(inc_counter):
+	.long 0,1,2,3
+L(unsigned_cmp):
+	.long 0x80000000,0x80000000,0x80000000,0x80000000
+
+	.hidden __chacha20_sse2_blocks4
+ENTRY (__chacha20_sse2_blocks4)
+	/* input:
+	 *	%rdi: input
+	 *	%rsi: dst
+	 *	%rdx: src
+	 *	%rcx: nblks (multiple of 4)
+	 */
+
+	pushq %rbp;
+	cfi_adjust_cfa_offset(8);
+	cfi_rel_offset(rbp, 0)
+	movq %rsp, %rbp;
+	cfi_def_cfa_register(%rbp);
+
+	subq $STACK_MAX, %rsp;
+	andq $~15, %rsp;
+
+L(loop4):
+	mov $20, ROUND;
+
+	/* Construct counter vectors X12 and X13 */
+	movdqa L(inc_counter) rRIP, X0;
+	movdqa L(unsigned_cmp) rRIP, X2;
+	PBROADCASTD((12 * 4)(INPUT), X12);
+	PBROADCASTD((13 * 4)(INPUT), X13);
+	paddd X0, X12;
+	movdqa X12, X1;
+	pxor X2, X0;
+	pxor X2, X1;
+	pcmpgtd X1, X0;
+	psubd X0, X13;
+	movdqa X12, (STACK_VEC_X12)(%rsp);
+	movdqa X13, (STACK_VEC_X13)(%rsp);
+
+	/* Load vectors */
+	PBROADCASTD((0 * 4)(INPUT), X0);
+	PBROADCASTD((1 * 4)(INPUT), X1);
+	PBROADCASTD((2 * 4)(INPUT), X2);
+	PBROADCASTD((3 * 4)(INPUT), X3);
+	PBROADCASTD((4 * 4)(INPUT), X4);
+	PBROADCASTD((5 * 4)(INPUT), X5);
+	PBROADCASTD((6 * 4)(INPUT), X6);
+	PBROADCASTD((7 * 4)(INPUT), X7);
+	PBROADCASTD((8 * 4)(INPUT), X8);
+	PBROADCASTD((9 * 4)(INPUT), X9);
+	PBROADCASTD((10 * 4)(INPUT), X10);
+	PBROADCASTD((11 * 4)(INPUT), X11);
+	PBROADCASTD((14 * 4)(INPUT), X14);
+	PBROADCASTD((15 * 4)(INPUT), X15);
+	movdqa X11, (STACK_TMP)(%rsp);
+	movdqa X15, (STACK_TMP1)(%rsp);
+
+L(round2_4):
+	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X11,X15)
+	movdqa (STACK_TMP)(%rsp), X11;
+	movdqa (STACK_TMP1)(%rsp), X15;
+	movdqa X8, (STACK_TMP)(%rsp);
+	movdqa X9, (STACK_TMP1)(%rsp);
+	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,X9)
+	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,X9)
+	movdqa (STACK_TMP)(%rsp), X8;
+	movdqa (STACK_TMP1)(%rsp), X9;
+	movdqa X11, (STACK_TMP)(%rsp);
+	movdqa X15, (STACK_TMP1)(%rsp);
+	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X11,X15)
+	sub $2, ROUND;
+	jnz L(round2_4);
+
+	/* tmp := X15 */
+	movdqa (STACK_TMP)(%rsp), X11;
+	PBROADCASTD((0 * 4)(INPUT), X15);
+	PLUS(X0, X15);
+	PBROADCASTD((1 * 4)(INPUT), X15);
+	PLUS(X1, X15);
+	PBROADCASTD((2 * 4)(INPUT), X15);
+	PLUS(X2, X15);
+	PBROADCASTD((3 * 4)(INPUT), X15);
+	PLUS(X3, X15);
+	PBROADCASTD((4 * 4)(INPUT), X15);
+	PLUS(X4, X15);
+	PBROADCASTD((5 * 4)(INPUT), X15);
+	PLUS(X5, X15);
+	PBROADCASTD((6 * 4)(INPUT), X15);
+	PLUS(X6, X15);
+	PBROADCASTD((7 * 4)(INPUT), X15);
+	PLUS(X7, X15);
+	PBROADCASTD((8 * 4)(INPUT), X15);
+	PLUS(X8, X15);
+	PBROADCASTD((9 * 4)(INPUT), X15);
+	PLUS(X9, X15);
+	PBROADCASTD((10 * 4)(INPUT), X15);
+	PLUS(X10, X15);
+	PBROADCASTD((11 * 4)(INPUT), X15);
+	PLUS(X11, X15);
+	movdqa (STACK_VEC_X12)(%rsp), X15;
+	PLUS(X12, X15);
+	movdqa (STACK_VEC_X13)(%rsp), X15;
+	PLUS(X13, X15);
+	movdqa X13, (STACK_TMP)(%rsp);
+	PBROADCASTD((14 * 4)(INPUT), X15);
+	PLUS(X14, X15);
+	movdqa (STACK_TMP1)(%rsp), X15;
+	movdqa X14, (STACK_TMP1)(%rsp);
+	PBROADCASTD((15 * 4)(INPUT), X13);
+	PLUS(X15, X13);
+	movdqa X15, (STACK_TMP2)(%rsp);
+
+	/* Update counter */
+	addq $4, (12 * 4)(INPUT);
+
+	TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15);
+	movdqu X0, (64 * 0 + 16 * 0)(DST)
+	movdqu X1, (64 * 1 + 16 * 0)(DST)
+	movdqu X2, (64 * 2 + 16 * 0)(DST)
+	movdqu X3, (64 * 3 + 16 * 0)(DST)
+	TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2);
+	movdqa (STACK_TMP)(%rsp), X13;
+	movdqa (STACK_TMP1)(%rsp), X14;
+	movdqa (STACK_TMP2)(%rsp), X15;
+	movdqu X4, (64 * 0 + 16 * 1)(DST)
+	movdqu X5, (64 * 1 + 16 * 1)(DST)
+	movdqu X6, (64 * 2 + 16 * 1)(DST)
+	movdqu X7, (64 * 3 + 16 * 1)(DST)
+	TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2);
+	movdqu X8,  (64 * 0 + 16 * 2)(DST)
+	movdqu X9,  (64 * 1 + 16 * 2)(DST)
+	movdqu X10, (64 * 2 + 16 * 2)(DST)
+	movdqu X11, (64 * 3 + 16 * 2)(DST)
+	TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2);
+	movdqu X12, (64 * 0 + 16 * 3)(DST)
+	movdqu X13, (64 * 1 + 16 * 3)(DST)
+	movdqu X14, (64 * 2 + 16 * 3)(DST)
+	movdqu X15, (64 * 3 + 16 * 3)(DST)
+
+	sub $4, NBLKS;
+	lea (4 * 64)(DST), DST;
+	lea (4 * 64)(SRC), SRC;
+	jnz L(loop4);
+
+	/* CLEAR the used vector registers and stack */
+	CLEAR(X0);
+	movdqa X0, (STACK_VEC_X12)(%rsp);
+	movdqa X0, (STACK_VEC_X13)(%rsp);
+	movdqa X0, (STACK_TMP)(%rsp);
+	movdqa X0, (STACK_TMP1)(%rsp);
+	movdqa X0, (STACK_TMP2)(%rsp);
+	CLEAR(X1);
+	CLEAR(X2);
+	CLEAR(X3);
+	CLEAR(X4);
+	CLEAR(X5);
+	CLEAR(X6);
+	CLEAR(X7);
+	CLEAR(X8);
+	CLEAR(X9);
+	CLEAR(X10);
+	CLEAR(X11);
+	CLEAR(X12);
+	CLEAR(X13);
+	CLEAR(X14);
+	CLEAR(X15);
+
+	/* eax zeroed by round loop. */
+	leave;
+	cfi_adjust_cfa_offset(-8)
+	cfi_def_cfa_register(%rsp);
+	ret_spec_stop;
+END (__chacha20_sse2_blocks4)
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
new file mode 100644
index 0000000000..5738c840a9
--- /dev/null
+++ b/sysdeps/x86_64/chacha20_arch.h
@@ -0,0 +1,38 @@
+/* Chacha20 implementation, used on arc4random.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+#include <cpu-features.h>
+#include <sys/param.h>
+
+unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
+				      const uint8_t *src, size_t nblks)
+     attribute_hidden;
+
+static inline void
+chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
+		size_t bytes)
+{
+  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
+		  "CHACHA20_BUFSIZE not multiple of 4");
+  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
+		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
+
+  __chacha20_sse2_blocks4 (state, dst, src,
+			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+}
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 6/9] x86: Add AVX2 optimized chacha20
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
                   ` (4 preceding siblings ...)
  2022-04-19 21:28 ` [PATCH v3 5/9] x86: Add SSE2 " Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 7/9] powerpc64: Add " Adhemerval Zanella
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S.  It is used only if AVX2 is supported
and enabled by the architecture.

As for generic implementation, the last step that XOR with the
input is omited.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

SSE2:
--------------------------------------------------
arc4random [single-thread]               637.06
arc4random_buf(16) [single-thread]       856.62
arc4random_buf(32) [single-thread]       1129.41
arc4random_buf(48) [single-thread]       1260.61
arc4random_buf(64) [single-thread]       1330.56
arc4random_buf(80) [single-thread]       1353.84
arc4random_buf(96) [single-thread]       1376.53
arc4random_buf(112) [single-thread]      1405.74
arc4random_buf(128) [single-thread]      1422.59
--------------------------------------------------

AVX2:
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               809.53
arc4random_buf(16) [single-thread]       1242.56
arc4random_buf(32) [single-thread]       1915.90
arc4random_buf(48) [single-thread]       2230.03
arc4random_buf(64) [single-thread]       2429.68
arc4random_buf(80) [single-thread]       2489.70
arc4random_buf(96) [single-thread]       2598.88
arc4random_buf(112) [single-thread]      2699.93
arc4random_buf(128) [single-thread]      2747.31

Checked on x86_64-linux-gnu.
---
 LICENSES                       |   5 +-
 sysdeps/x86_64/Makefile        |   1 +
 sysdeps/x86_64/chacha20-avx2.S | 313 +++++++++++++++++++++++++++++++++
 sysdeps/x86_64/chacha20_arch.h |  22 ++-
 4 files changed, 333 insertions(+), 8 deletions(-)
 create mode 100644 sysdeps/x86_64/chacha20-avx2.S

diff --git a/LICENSES b/LICENSES
index 415991e208..05a5c07fcf 100644
--- a/LICENSES
+++ b/LICENSES
@@ -390,8 +390,9 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
 \f
-sysdeps/aarch64/chacha20.S and sysdeps/x86_64/chacha20-sse2.S
-import code from libgcrypt, with the following notices:
+sysdeps/aarch64/chacha20.S, sysdeps/x86_64/chacha20-sse2.S, and
+sysdeps/x86_64/chacha20-avx2.S import code from libgcrypt, with the
+following notices:
 
 Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
 
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index c8fbc30857..0fa8897404 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -8,6 +8,7 @@ endif
 ifeq ($(subdir),stdlib)
 sysdep_routines += \
   chacha20-sse2 \
+  chacha20-avx2 \
   # sysdep_routines
 endif
 
diff --git a/sysdeps/x86_64/chacha20-avx2.S b/sysdeps/x86_64/chacha20-avx2.S
new file mode 100644
index 0000000000..fb76865890
--- /dev/null
+++ b/sysdeps/x86_64/chacha20-avx2.S
@@ -0,0 +1,313 @@
+/* Optimized AVX2 implementation of ChaCha20 cipher.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+/* Based on D. J. Bernstein reference implementation at
+   http://cr.yp.to/chacha.html:
+
+   chacha-regs.c version 20080118
+   D. J. Bernstein
+   Public domain.  */
+
+#ifdef PIC
+#  define rRIP (%rip)
+#else
+#  define rRIP
+#endif
+
+/* register macros */
+#define INPUT %rdi
+#define DST   %rsi
+#define SRC   %rdx
+#define NBLKS %rcx
+#define ROUND %eax
+
+/* stack structure */
+#define STACK_VEC_X12 (32)
+#define STACK_VEC_X13 (32 + STACK_VEC_X12)
+#define STACK_TMP     (32 + STACK_VEC_X13)
+#define STACK_TMP1    (32 + STACK_TMP)
+
+#define STACK_MAX     (32 + STACK_TMP1)
+
+/* vector registers */
+#define X0 %ymm0
+#define X1 %ymm1
+#define X2 %ymm2
+#define X3 %ymm3
+#define X4 %ymm4
+#define X5 %ymm5
+#define X6 %ymm6
+#define X7 %ymm7
+#define X8 %ymm8
+#define X9 %ymm9
+#define X10 %ymm10
+#define X11 %ymm11
+#define X12 %ymm12
+#define X13 %ymm13
+#define X14 %ymm14
+#define X15 %ymm15
+
+#define X0h %xmm0
+#define X1h %xmm1
+#define X2h %xmm2
+#define X3h %xmm3
+#define X4h %xmm4
+#define X5h %xmm5
+#define X6h %xmm6
+#define X7h %xmm7
+#define X8h %xmm8
+#define X9h %xmm9
+#define X10h %xmm10
+#define X11h %xmm11
+#define X12h %xmm12
+#define X13h %xmm13
+#define X14h %xmm14
+#define X15h %xmm15
+
+/**********************************************************************
+  helper macros
+ **********************************************************************/
+
+/* 4x4 32-bit integer matrix transpose */
+#define transpose_4x4(x0,x1,x2,x3,t1,t2) \
+	vpunpckhdq x1, x0, t2; \
+	vpunpckldq x1, x0, x0; \
+	\
+	vpunpckldq x3, x2, t1; \
+	vpunpckhdq x3, x2, x2; \
+	\
+	vpunpckhqdq t1, x0, x1; \
+	vpunpcklqdq t1, x0, x0; \
+	\
+	vpunpckhqdq x2, t2, x3; \
+	vpunpcklqdq x2, t2, x2;
+
+/* 2x2 128-bit matrix transpose */
+#define transpose_16byte_2x2(x0,x1,t1) \
+	vmovdqa    x0, t1; \
+	vperm2i128 $0x20, x1, x0, x0; \
+	vperm2i128 $0x31, x1, t1, x1;
+
+/**********************************************************************
+  8-way chacha20
+ **********************************************************************/
+
+#define ROTATE2(v1,v2,c,tmp)	\
+	vpsrld $(32 - (c)), v1, tmp;	\
+	vpslld $(c), v1, v1;		\
+	vpaddb tmp, v1, v1;		\
+	vpsrld $(32 - (c)), v2, tmp;	\
+	vpslld $(c), v2, v2;		\
+	vpaddb tmp, v2, v2;
+
+#define ROTATE_SHUF_2(v1,v2,shuf)	\
+	vpshufb shuf, v1, v1;		\
+	vpshufb shuf, v2, v2;
+
+#define XOR(ds,s) \
+	vpxor s, ds, ds;
+
+#define PLUS(ds,s) \
+	vpaddd s, ds, ds;
+
+#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\
+		      interleave_op1,interleave_op2,\
+		      interleave_op3,interleave_op4)		\
+	vbroadcasti128 .Lshuf_rol16 rRIP, tmp1;			\
+		interleave_op1;					\
+	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
+	    ROTATE_SHUF_2(d1, d2, tmp1);			\
+		interleave_op2;					\
+	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
+	    ROTATE2(b1, b2, 12, tmp1);				\
+	vbroadcasti128 .Lshuf_rol8 rRIP, tmp1;			\
+		interleave_op3;					\
+	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
+	    ROTATE_SHUF_2(d1, d2, tmp1);			\
+		interleave_op4;					\
+	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
+	    ROTATE2(b1, b2,  7, tmp1);
+
+	.section .text.avx2, "ax", @progbits
+	.align 32
+chacha20_data:
+L(shuf_rol16):
+	.byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
+L(shuf_rol8):
+	.byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
+L(inc_counter):
+	.byte 0,1,2,3,4,5,6,7
+L(unsigned_cmp):
+	.long 0x80000000
+
+	.hidden __chacha20_avx2_blocks8
+ENTRY (__chacha20_avx2_blocks8)
+	/* input:
+	 *	%rdi: input
+	 *	%rsi: dst
+	 *	%rdx: src
+	 *	%rcx: nblks (multiple of 8)
+	 */
+	vzeroupper;
+
+	pushq %rbp;
+	cfi_adjust_cfa_offset(8);
+	cfi_rel_offset(rbp, 0)
+	movq %rsp, %rbp;
+	cfi_def_cfa_register(rbp);
+
+	subq $STACK_MAX, %rsp;
+	andq $~31, %rsp;
+
+L(loop8):
+	mov $20, ROUND;
+
+	/* Construct counter vectors X12 and X13 */
+	vpmovzxbd L(inc_counter) rRIP, X0;
+	vpbroadcastd L(unsigned_cmp) rRIP, X2;
+	vpbroadcastd (12 * 4)(INPUT), X12;
+	vpbroadcastd (13 * 4)(INPUT), X13;
+	vpaddd X0, X12, X12;
+	vpxor X2, X0, X0;
+	vpxor X2, X12, X1;
+	vpcmpgtd X1, X0, X0;
+	vpsubd X0, X13, X13;
+	vmovdqa X12, (STACK_VEC_X12)(%rsp);
+	vmovdqa X13, (STACK_VEC_X13)(%rsp);
+
+	/* Load vectors */
+	vpbroadcastd (0 * 4)(INPUT), X0;
+	vpbroadcastd (1 * 4)(INPUT), X1;
+	vpbroadcastd (2 * 4)(INPUT), X2;
+	vpbroadcastd (3 * 4)(INPUT), X3;
+	vpbroadcastd (4 * 4)(INPUT), X4;
+	vpbroadcastd (5 * 4)(INPUT), X5;
+	vpbroadcastd (6 * 4)(INPUT), X6;
+	vpbroadcastd (7 * 4)(INPUT), X7;
+	vpbroadcastd (8 * 4)(INPUT), X8;
+	vpbroadcastd (9 * 4)(INPUT), X9;
+	vpbroadcastd (10 * 4)(INPUT), X10;
+	vpbroadcastd (11 * 4)(INPUT), X11;
+	vpbroadcastd (14 * 4)(INPUT), X14;
+	vpbroadcastd (15 * 4)(INPUT), X15;
+	vmovdqa X15, (STACK_TMP)(%rsp);
+
+L(round2):
+	QUARTERROUND2(X0, X4,  X8, X12,   X1, X5,  X9, X13, tmp:=,X15,,,,)
+	vmovdqa (STACK_TMP)(%rsp), X15;
+	vmovdqa X8, (STACK_TMP)(%rsp);
+	QUARTERROUND2(X2, X6, X10, X14,   X3, X7, X11, X15, tmp:=,X8,,,,)
+	QUARTERROUND2(X0, X5, X10, X15,   X1, X6, X11, X12, tmp:=,X8,,,,)
+	vmovdqa (STACK_TMP)(%rsp), X8;
+	vmovdqa X15, (STACK_TMP)(%rsp);
+	QUARTERROUND2(X2, X7,  X8, X13,   X3, X4,  X9, X14, tmp:=,X15,,,,)
+	sub $2, ROUND;
+	jnz L(round2);
+
+	vmovdqa X8, (STACK_TMP1)(%rsp);
+
+	/* tmp := X15 */
+	vpbroadcastd (0 * 4)(INPUT), X15;
+	PLUS(X0, X15);
+	vpbroadcastd (1 * 4)(INPUT), X15;
+	PLUS(X1, X15);
+	vpbroadcastd (2 * 4)(INPUT), X15;
+	PLUS(X2, X15);
+	vpbroadcastd (3 * 4)(INPUT), X15;
+	PLUS(X3, X15);
+	vpbroadcastd (4 * 4)(INPUT), X15;
+	PLUS(X4, X15);
+	vpbroadcastd (5 * 4)(INPUT), X15;
+	PLUS(X5, X15);
+	vpbroadcastd (6 * 4)(INPUT), X15;
+	PLUS(X6, X15);
+	vpbroadcastd (7 * 4)(INPUT), X15;
+	PLUS(X7, X15);
+	transpose_4x4(X0, X1, X2, X3, X8, X15);
+	transpose_4x4(X4, X5, X6, X7, X8, X15);
+	vmovdqa (STACK_TMP1)(%rsp), X8;
+	transpose_16byte_2x2(X0, X4, X15);
+	transpose_16byte_2x2(X1, X5, X15);
+	transpose_16byte_2x2(X2, X6, X15);
+	transpose_16byte_2x2(X3, X7, X15);
+	vmovdqa (STACK_TMP)(%rsp), X15;
+	vmovdqu X0, (64 * 0 + 16 * 0)(DST)
+	vmovdqu X1, (64 * 1 + 16 * 0)(DST)
+	vpbroadcastd (8 * 4)(INPUT), X0;
+	PLUS(X8, X0);
+	vpbroadcastd (9 * 4)(INPUT), X0;
+	PLUS(X9, X0);
+	vpbroadcastd (10 * 4)(INPUT), X0;
+	PLUS(X10, X0);
+	vpbroadcastd (11 * 4)(INPUT), X0;
+	PLUS(X11, X0);
+	vmovdqa (STACK_VEC_X12)(%rsp), X0;
+	PLUS(X12, X0);
+	vmovdqa (STACK_VEC_X13)(%rsp), X0;
+	PLUS(X13, X0);
+	vpbroadcastd (14 * 4)(INPUT), X0;
+	PLUS(X14, X0);
+	vpbroadcastd (15 * 4)(INPUT), X0;
+	PLUS(X15, X0);
+	vmovdqu X2, (64 * 2 + 16 * 0)(DST)
+	vmovdqu X3, (64 * 3 + 16 * 0)(DST)
+
+	/* Update counter */
+	addq $8, (12 * 4)(INPUT);
+
+	transpose_4x4(X8, X9, X10, X11, X0, X1);
+	transpose_4x4(X12, X13, X14, X15, X0, X1);
+	vmovdqu X4, (64 * 4 + 16 * 0)(DST)
+	vmovdqu X5, (64 * 5 + 16 * 0)(DST)
+	transpose_16byte_2x2(X8, X12, X0);
+	transpose_16byte_2x2(X9, X13, X0);
+	transpose_16byte_2x2(X10, X14, X0);
+	transpose_16byte_2x2(X11, X15, X0);
+	vmovdqu X6,  (64 * 6 + 16 * 0)(DST)
+	vmovdqu X7,  (64 * 7 + 16 * 0)(DST)
+	vmovdqu X8,  (64 * 0 + 16 * 2)(DST)
+	vmovdqu X9,  (64 * 1 + 16 * 2)(DST)
+	vmovdqu X10, (64 * 2 + 16 * 2)(DST)
+	vmovdqu X11, (64 * 3 + 16 * 2)(DST)
+	vmovdqu X12, (64 * 4 + 16 * 2)(DST)
+	vmovdqu X13, (64 * 5 + 16 * 2)(DST)
+	vmovdqu X14, (64 * 6 + 16 * 2)(DST)
+	vmovdqu X15, (64 * 7 + 16 * 2)(DST)
+
+	sub $8, NBLKS;
+	lea (8 * 64)(DST), DST;
+	lea (8 * 64)(SRC), SRC;
+	jnz L(loop8);
+
+	/* clear the used vector registers and stack */
+	vpxor X0, X0, X0;
+	vmovdqa X0, (STACK_VEC_X12)(%rsp);
+	vmovdqa X0, (STACK_VEC_X13)(%rsp);
+	vmovdqa X0, (STACK_TMP)(%rsp);
+	vmovdqa X0, (STACK_TMP1)(%rsp);
+	vzeroall;
+
+	/* eax zeroed by round loop. */
+	leave;
+	cfi_adjust_cfa_offset(-8)
+	cfi_def_cfa_register(%rsp);
+	ret;
+	int3;
+END(__chacha20_avx2_blocks8)
diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
index 5738c840a9..bfdc6c0a36 100644
--- a/sysdeps/x86_64/chacha20_arch.h
+++ b/sysdeps/x86_64/chacha20_arch.h
@@ -23,16 +23,26 @@
 unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst,
 				      const uint8_t *src, size_t nblks)
      attribute_hidden;
+unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
+				      const uint8_t *src, size_t nblks)
+     attribute_hidden;
 
 static inline void
 chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
 		size_t bytes)
 {
-  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
-		  "CHACHA20_BUFSIZE not multiple of 4");
-  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
-		  "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4");
+  _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0,
+		  "CHACHA20_BUFSIZE not multiple of 4 or 8");
+  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
+		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
+  const struct cpu_features* cpu_features = __get_cpu_features ();
 
-  __chacha20_sse2_blocks4 (state, dst, src,
-			   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+  /* AVX2 version uses vzeroupper, so disable it if RTM is enabled.  */
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
+    __chacha20_avx2_blocks8 (state, dst, src,
+			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+  else
+    __chacha20_sse2_blocks4 (state, dst, src,
+			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
 }
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 7/9] powerpc64: Add optimized chacha20
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
                   ` (5 preceding siblings ...)
  2022-04-19 21:28 ` [PATCH v3 6/9] x86: Add AVX2 " Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-20 18:38   ` Paul E Murphy
  2022-04-19 21:28 ` [PATCH v3 8/9] s390x: " Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random Adhemerval Zanella
  8 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-ppc.c.  It targets POWER8 and it is used on
default for LE.

On a POWER8 it shows the following improvements (using
formatted bench-arc4random data):

GENERIC (powerpc64-linux-gnu)
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               71.08
arc4random_buf(16) [single-thread]       141.26
arc4random_buf(32) [single-thread]       198.31
arc4random_buf(48) [single-thread]       226.78
arc4random_buf(64) [single-thread]       246.69
arc4random_buf(80) [single-thread]       257.23
arc4random_buf(96) [single-thread]       268.06
arc4random_buf(112) [single-thread]      274.50
arc4random_buf(128) [single-thread]      279.56
--------------------------------------------------

POWER8
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               84.68
arc4random_buf(16) [single-thread]       210.75
arc4random_buf(32) [single-thread]       366.11
arc4random_buf(48) [single-thread]       471.99
arc4random_buf(64) [single-thread]       567.06
arc4random_buf(80) [single-thread]       633.79
arc4random_buf(96) [single-thread]       693.16
arc4random_buf(112) [single-thread]      737.77
arc4random_buf(128) [single-thread]      774.38
--------------------------------------------------

Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu.
---
 LICENSES                                  |   6 +-
 sysdeps/powerpc/powerpc64/Makefile        |   3 +
 sysdeps/powerpc/powerpc64/chacha20-ppc.c  | 236 ++++++++++++++++++++++
 sysdeps/powerpc/powerpc64/chacha20_arch.h |  47 +++++
 4 files changed, 289 insertions(+), 3 deletions(-)
 create mode 100644 sysdeps/powerpc/powerpc64/chacha20-ppc.c
 create mode 100644 sysdeps/powerpc/powerpc64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index 05a5c07fcf..1c6c5d73e6 100644
--- a/LICENSES
+++ b/LICENSES
@@ -390,9 +390,9 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  License along with this library; if not, see
  <https://www.gnu.org/licenses/>.  */
 \f
-sysdeps/aarch64/chacha20.S, sysdeps/x86_64/chacha20-sse2.S, and
-sysdeps/x86_64/chacha20-avx2.S import code from libgcrypt, with the
-following notices:
+sysdeps/aarch64/chacha20.S, sysdeps/x86_64/chacha20-sse2.S,
+sysdeps/x86_64/chacha20-avx2.S, and sysdeps/powerpc/powerpc64/chacha20-ppc.c
+import code from libgcrypt, with the following notices:
 
 Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
 
diff --git a/sysdeps/powerpc/powerpc64/Makefile b/sysdeps/powerpc/powerpc64/Makefile
index 679d5e49ba..18943ef09e 100644
--- a/sysdeps/powerpc/powerpc64/Makefile
+++ b/sysdeps/powerpc/powerpc64/Makefile
@@ -66,6 +66,9 @@ tst-setjmp-bug21895-static-ENV = \
 endif
 
 ifeq ($(subdir),stdlib)
+sysdep_routines += chacha20-ppc
+CFLAGS-chacha20-ppc.c += -mcpu=power8
+
 CFLAGS-tst-ucontext-ppc64-vscr.c += -maltivec
 tests += tst-ucontext-ppc64-vscr
 endif
diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
new file mode 100644
index 0000000000..e2567c379a
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
@@ -0,0 +1,236 @@
+/* Optimized PowerPC implementation of ChaCha20 cipher.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <altivec.h>
+#include <endian.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <sys/cdefs.h>
+
+typedef vector unsigned char vector16x_u8;
+typedef vector unsigned int vector4x_u32;
+typedef vector unsigned long long vector2x_u64;
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+static const vector16x_u8 le_bswap_const =
+  { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 };
+#endif
+
+static inline vector4x_u32
+vec_rol_elems (vector4x_u32 v, unsigned int idx)
+{
+#if __BYTE_ORDER != __BIG_ENDIAN
+  return vec_sld (v, v, (16 - (4 * idx)) & 15);
+#else
+  return vec_sld (v, v, (4 * idx) & 15);
+#endif
+}
+
+static inline vector4x_u32
+vec_load_le (unsigned long offset, const unsigned char *ptr)
+{
+  vector4x_u32 vec;
+  vec = vec_vsx_ld (offset, (const uint32_t *)ptr);
+#if __BYTE_ORDER == __BIG_ENDIAN
+  vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec,
+				 le_bswap_const);
+#endif
+  return vec;
+}
+
+static inline void
+vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr)
+{
+#if __BYTE_ORDER == __BIG_ENDIAN
+  vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec,
+			       le_bswap_const);
+#endif
+  vec_vsx_st (vec, offset, (uint32_t *)ptr);
+}
+
+
+static inline vector4x_u32
+vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a)
+{
+#if __BYTE_ORDER == __BIG_ENDIAN
+  static const vector16x_u8 swap32 =
+    { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 };
+  vector2x_u64 vec, add, sum;
+
+  vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32);
+  add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32);
+  sum = vec + add;
+  return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32);
+#else
+  return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a));
+#endif
+}
+
+/**********************************************************************
+  4-way chacha20
+ **********************************************************************/
+
+#define ROTATE(v1,rolv)			\
+	__asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv))
+
+#define PLUS(ds,s) \
+	((ds) += (s))
+
+#define XOR(ds,s) \
+	((ds) ^= (s))
+
+#define ADD_U64(v,a) \
+	(v = vec_add_ctr_u64(v, a))
+
+/* 4x4 32-bit integer matrix transpose */
+#define transpose_4x4(x0, x1, x2, x3) ({ \
+	vector4x_u32 t1 = vec_mergeh(x0, x2); \
+	vector4x_u32 t2 = vec_mergel(x0, x2); \
+	vector4x_u32 t3 = vec_mergeh(x1, x3); \
+	x3 = vec_mergel(x1, x3); \
+	x0 = vec_mergeh(t1, t3); \
+	x1 = vec_mergel(t1, t3); \
+	x2 = vec_mergeh(t2, x3); \
+	x3 = vec_mergel(t2, x3); \
+      })
+
+#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2)			\
+	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
+	    ROTATE(d1, rotate_16); ROTATE(d2, rotate_16);	\
+	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
+	    ROTATE(b1, rotate_12); ROTATE(b2, rotate_12);	\
+	PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2);	\
+	    ROTATE(d1, rotate_8); ROTATE(d2, rotate_8);		\
+	PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2);	\
+	    ROTATE(b1, rotate_7); ROTATE(b2, rotate_7);
+
+unsigned int attribute_hidden
+__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src,
+			   size_t nblks)
+{
+  vector4x_u32 counters_0123 = { 0, 1, 2, 3 };
+  vector4x_u32 counter_4 = { 4, 0, 0, 0 };
+  vector4x_u32 rotate_16 = { 16, 16, 16, 16 };
+  vector4x_u32 rotate_12 = { 12, 12, 12, 12 };
+  vector4x_u32 rotate_8 = { 8, 8, 8, 8 };
+  vector4x_u32 rotate_7 = { 7, 7, 7, 7 };
+  vector4x_u32 state0, state1, state2, state3;
+  vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7;
+  vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15;
+  vector4x_u32 tmp;
+  int i;
+
+  /* Force preload of constants to vector registers.  */
+  __asm__ ("": "+v" (counters_0123) :: "memory");
+  __asm__ ("": "+v" (counter_4) :: "memory");
+  __asm__ ("": "+v" (rotate_16) :: "memory");
+  __asm__ ("": "+v" (rotate_12) :: "memory");
+  __asm__ ("": "+v" (rotate_8) :: "memory");
+  __asm__ ("": "+v" (rotate_7) :: "memory");
+
+  state0 = vec_vsx_ld (0 * 16, state);
+  state1 = vec_vsx_ld (1 * 16, state);
+  state2 = vec_vsx_ld (2 * 16, state);
+  state3 = vec_vsx_ld (3 * 16, state);
+
+  do
+    {
+      v0 = vec_splat (state0, 0);
+      v1 = vec_splat (state0, 1);
+      v2 = vec_splat (state0, 2);
+      v3 = vec_splat (state0, 3);
+      v4 = vec_splat (state1, 0);
+      v5 = vec_splat (state1, 1);
+      v6 = vec_splat (state1, 2);
+      v7 = vec_splat (state1, 3);
+      v8 = vec_splat (state2, 0);
+      v9 = vec_splat (state2, 1);
+      v10 = vec_splat (state2, 2);
+      v11 = vec_splat (state2, 3);
+      v12 = vec_splat (state3, 0);
+      v13 = vec_splat (state3, 1);
+      v14 = vec_splat (state3, 2);
+      v15 = vec_splat (state3, 3);
+
+      v12 += counters_0123;
+      v13 -= vec_cmplt (v12, counters_0123);
+
+      for (i = 20; i > 0; i -= 2)
+	{
+	  QUARTERROUND2 (v0, v4,  v8, v12,   v1, v5,  v9, v13)
+	  QUARTERROUND2 (v2, v6, v10, v14,   v3, v7, v11, v15)
+	  QUARTERROUND2 (v0, v5, v10, v15,   v1, v6, v11, v12)
+	  QUARTERROUND2 (v2, v7,  v8, v13,   v3, v4,  v9, v14)
+	}
+
+      v0 += vec_splat (state0, 0);
+      v1 += vec_splat (state0, 1);
+      v2 += vec_splat (state0, 2);
+      v3 += vec_splat (state0, 3);
+      v4 += vec_splat (state1, 0);
+      v5 += vec_splat (state1, 1);
+      v6 += vec_splat (state1, 2);
+      v7 += vec_splat (state1, 3);
+      v8 += vec_splat (state2, 0);
+      v9 += vec_splat (state2, 1);
+      v10 += vec_splat (state2, 2);
+      v11 += vec_splat (state2, 3);
+      tmp = vec_splat( state3, 0);
+      tmp += counters_0123;
+      v12 += tmp;
+      v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123);
+      v14 += vec_splat (state3, 2);
+      v15 += vec_splat (state3, 3);
+      ADD_U64 (state3, counter_4);
+
+      transpose_4x4 (v0, v1, v2, v3);
+      transpose_4x4 (v4, v5, v6, v7);
+      transpose_4x4 (v8, v9, v10, v11);
+      transpose_4x4 (v12, v13, v14, v15);
+
+      vec_store_le (v0, (64 * 0 + 16 * 0), dst);
+      vec_store_le (v1, (64 * 1 + 16 * 0), dst);
+      vec_store_le (v2, (64 * 2 + 16 * 0), dst);
+      vec_store_le (v3, (64 * 3 + 16 * 0), dst);
+
+      vec_store_le (v4, (64 * 0 + 16 * 1), dst);
+      vec_store_le (v5, (64 * 1 + 16 * 1), dst);
+      vec_store_le (v6, (64 * 2 + 16 * 1), dst);
+      vec_store_le (v7, (64 * 3 + 16 * 1), dst);
+
+      vec_store_le (v8, (64 * 0 + 16 * 2), dst);
+      vec_store_le (v9, (64 * 1 + 16 * 2), dst);
+      vec_store_le (v10, (64 * 2 + 16 * 2), dst);
+      vec_store_le (v11, (64 * 3 + 16 * 2), dst);
+
+      vec_store_le (v12, (64 * 0 + 16 * 3), dst);
+      vec_store_le (v13, (64 * 1 + 16 * 3), dst);
+      vec_store_le (v14, (64 * 2 + 16 * 3), dst);
+      vec_store_le (v15, (64 * 3 + 16 * 3), dst);
+
+      src += 4*64;
+      dst += 4*64;
+
+      nblks -= 4;
+    }
+  while (nblks);
+
+  vec_vsx_st (state3, 3 * 16, state);
+
+  return 0;
+}
diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h
new file mode 100644
index 0000000000..a18115392f
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h
@@ -0,0 +1,47 @@
+/* PowerPC optimization for ChaCha20.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdbool.h>
+#include <ldsodefs.h>
+
+unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
+					const uint8_t *src, size_t nblks)
+     attribute_hidden;
+
+static void
+chacha20_crypt (uint32_t *state, uint8_t *dst,
+		const uint8_t *src, size_t bytes)
+{
+  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
+		  "CHACHA20_BUFSIZE not multiple of 4");
+  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
+		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
+
+#ifdef __LITTLE_ENDIAN__
+  __chacha20_power8_blocks4 (state, dst, src,
+			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+#else
+  unsigned long int hwcap = GLRO(dl_hwcap);
+  unsigned long int hwcap2 = GLRO(dl_hwcap2);
+  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
+    __chacha20_power8_blocks4 (state, dst, src,
+			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+  else
+    chacha20_crypt_generic (state, dst, src, bytes);
+#endif
+}
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 8/9] s390x: Add optimized chacha20
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
                   ` (6 preceding siblings ...)
  2022-04-19 21:28 ` [PATCH v3 7/9] powerpc64: Add " Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-19 21:28 ` [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random Adhemerval Zanella
  8 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-s390x.S.

On a z15 it shows the following improvements (using formatted
bench-arc4random data):

GENERIC
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               150.27
arc4random_buf(16) [single-thread]       239.11
arc4random_buf(32) [single-thread]       268.36
arc4random_buf(48) [single-thread]       311.75
arc4random_buf(64) [single-thread]       300.16
arc4random_buf(80) [single-thread]       329.82
arc4random_buf(96) [single-thread]       305.98
arc4random_buf(112) [single-thread]      334.56
arc4random_buf(128) [single-thread]      313.29
--------------------------------------------------

VX
Function                                 MB/s
--------------------------------------------------
arc4random [single-thread]               239.59
arc4random_buf(16) [single-thread]       552.42
arc4random_buf(32) [single-thread]       854.64
arc4random_buf(48) [single-thread]       1048.83
arc4random_buf(64) [single-thread]       1184.54
arc4random_buf(80) [single-thread]       1282.09
arc4random_buf(96) [single-thread]       1363.40
arc4random_buf(112) [single-thread]      1420.99
arc4random_buf(128) [single-thread]      1464.38
--------------------------------------------------

Checked on s390x-linux-gnu.
---
 LICENSES                             |   5 +-
 sysdeps/s390/s390-64/Makefile        |   4 +
 sysdeps/s390/s390-64/chacha20-vx.S   | 564 +++++++++++++++++++++++++++
 sysdeps/s390/s390-64/chacha20_arch.h |  45 +++
 4 files changed, 616 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/s390/s390-64/chacha20-vx.S
 create mode 100644 sysdeps/s390/s390-64/chacha20_arch.h

diff --git a/LICENSES b/LICENSES
index 1c6c5d73e6..12485408c7 100644
--- a/LICENSES
+++ b/LICENSES
@@ -391,8 +391,9 @@ Copyright 2001 by Stephen L. Moshier <moshier@na-net.ornl.gov>
  <https://www.gnu.org/licenses/>.  */
 \f
 sysdeps/aarch64/chacha20.S, sysdeps/x86_64/chacha20-sse2.S,
-sysdeps/x86_64/chacha20-avx2.S, and sysdeps/powerpc/powerpc64/chacha20-ppc.c
-import code from libgcrypt, with the following notices:
+sysdeps/x86_64/chacha20-avx2.S, sysdeps/powerpc/powerpc64/chacha20-ppc.c,
+and sysdeps/s390/s390-64/chacha20-vx.S import code from libgcrypt, with
+the following notices:
 
 Copyright (C) 2017-2019 Jussi Kivilinna <jussi.kivilinna@iki.fi>
 
diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile
index 66ed844e68..5c50fb117e 100644
--- a/sysdeps/s390/s390-64/Makefile
+++ b/sysdeps/s390/s390-64/Makefile
@@ -67,3 +67,7 @@ tests-container += tst-glibc-hwcaps-cache
 endif
 
 endif # $(subdir) == elf
+
+ifeq ($(subdir),stdlib)
+sysdep_routines += chacha20-vx
+endif
diff --git a/sysdeps/s390/s390-64/chacha20-vx.S b/sysdeps/s390/s390-64/chacha20-vx.S
new file mode 100644
index 0000000000..5123e3c064
--- /dev/null
+++ b/sysdeps/s390/s390-64/chacha20-vx.S
@@ -0,0 +1,564 @@
+/* Optimized s390x implementation of ChaCha20 cipher.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+#ifdef HAVE_S390_VX_ASM_SUPPORT
+
+/* CFA expressions are used for pointing CFA and registers to
+ * SP relative offsets. */
+# define DW_REGNO_SP 15
+
+/* Fixed length encoding used for integers for now. */
+# define DW_SLEB128_7BIT(value) \
+        0x00|((value) & 0x7f)
+# define DW_SLEB128_28BIT(value) \
+        0x80|((value)&0x7f), \
+        0x80|(((value)>>7)&0x7f), \
+        0x80|(((value)>>14)&0x7f), \
+        0x00|(((value)>>21)&0x7f)
+
+# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \
+        .cfi_escape \
+          0x0f, /* DW_CFA_def_cfa_expression */ \
+            DW_SLEB128_7BIT(11), /* length */ \
+          0x7f, /* DW_OP_breg15, rsp + constant */ \
+            DW_SLEB128_28BIT(rsp_offs), \
+          0x06, /* DW_OP_deref */ \
+          0x23, /* DW_OP_plus_constu */ \
+            DW_SLEB128_28BIT((cfa_depth)+160)
+
+.machine "z13+vx"
+.text
+
+.balign 16
+.Lconsts:
+.Lwordswap:
+	.byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3
+.Lbswap128:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+.Lbswap32:
+	.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
+.Lone:
+	.long 0, 0, 0, 1
+.Ladd_counter_0123:
+	.long 0, 1, 2, 3
+.Ladd_counter_4567:
+	.long 4, 5, 6, 7
+
+/* register macros */
+#define INPUT %r2
+#define DST   %r3
+#define SRC   %r4
+#define NBLKS %r0
+#define ROUND %r1
+
+/* stack structure */
+
+#define STACK_FRAME_STD    (8 * 16 + 8 * 4)
+#define STACK_FRAME_F8_F15 (8 * 8)
+#define STACK_FRAME_Y0_Y15 (16 * 16)
+#define STACK_FRAME_CTR    (4 * 16)
+#define STACK_FRAME_PARAMS (6 * 8)
+
+#define STACK_MAX   (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \
+		     STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \
+		     STACK_FRAME_PARAMS)
+
+#define STACK_F8     (STACK_MAX - STACK_FRAME_F8_F15)
+#define STACK_F9     (STACK_F8 + 8)
+#define STACK_F10    (STACK_F9 + 8)
+#define STACK_F11    (STACK_F10 + 8)
+#define STACK_F12    (STACK_F11 + 8)
+#define STACK_F13    (STACK_F12 + 8)
+#define STACK_F14    (STACK_F13 + 8)
+#define STACK_F15    (STACK_F14 + 8)
+#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15)
+#define STACK_CTR    (STACK_Y0_Y15 - STACK_FRAME_CTR)
+#define STACK_INPUT  (STACK_CTR - STACK_FRAME_PARAMS)
+#define STACK_DST    (STACK_INPUT + 8)
+#define STACK_SRC    (STACK_DST + 8)
+#define STACK_NBLKS  (STACK_SRC + 8)
+#define STACK_POCTX  (STACK_NBLKS + 8)
+#define STACK_POSRC  (STACK_POCTX + 8)
+
+#define STACK_G0_H3  STACK_Y0_Y15
+
+/* vector registers */
+#define A0 %v0
+#define A1 %v1
+#define A2 %v2
+#define A3 %v3
+
+#define B0 %v4
+#define B1 %v5
+#define B2 %v6
+#define B3 %v7
+
+#define C0 %v8
+#define C1 %v9
+#define C2 %v10
+#define C3 %v11
+
+#define D0 %v12
+#define D1 %v13
+#define D2 %v14
+#define D3 %v15
+
+#define E0 %v16
+#define E1 %v17
+#define E2 %v18
+#define E3 %v19
+
+#define F0 %v20
+#define F1 %v21
+#define F2 %v22
+#define F3 %v23
+
+#define G0 %v24
+#define G1 %v25
+#define G2 %v26
+#define G3 %v27
+
+#define H0 %v28
+#define H1 %v29
+#define H2 %v30
+#define H3 %v31
+
+#define IO0 E0
+#define IO1 E1
+#define IO2 E2
+#define IO3 E3
+#define IO4 F0
+#define IO5 F1
+#define IO6 F2
+#define IO7 F3
+
+#define S0 G0
+#define S1 G1
+#define S2 G2
+#define S3 G3
+
+#define TMP0 H0
+#define TMP1 H1
+#define TMP2 H2
+#define TMP3 H3
+
+#define X0 A0
+#define X1 A1
+#define X2 A2
+#define X3 A3
+#define X4 B0
+#define X5 B1
+#define X6 B2
+#define X7 B3
+#define X8 C0
+#define X9 C1
+#define X10 C2
+#define X11 C3
+#define X12 D0
+#define X13 D1
+#define X14 D2
+#define X15 D3
+
+#define Y0 E0
+#define Y1 E1
+#define Y2 E2
+#define Y3 E3
+#define Y4 F0
+#define Y5 F1
+#define Y6 F2
+#define Y7 F3
+#define Y8 G0
+#define Y9 G1
+#define Y10 G2
+#define Y11 G3
+#define Y12 H0
+#define Y13 H1
+#define Y14 H2
+#define Y15 H3
+
+/**********************************************************************
+  helper macros
+ **********************************************************************/
+
+#define _ /*_*/
+
+#define CLEAR(x,...) vzero x;
+
+#define START_STACK(last_r) \
+	lgr %r0, %r15; \
+	lghi %r1, ~15; \
+	stmg %r6, last_r, 6 * 8(%r15); \
+	aghi %r0, -STACK_MAX; \
+	ngr %r0, %r1; \
+	lgr %r1, %r15; \
+	cfi_def_cfa_register(1); \
+	lgr %r15, %r0; \
+	stg %r1, 0(%r15); \
+	cfi_cfa_on_stack(0, 0); \
+	std %f8, STACK_F8(%r15); \
+	std %f9, STACK_F9(%r15); \
+	std %f10, STACK_F10(%r15); \
+	std %f11, STACK_F11(%r15); \
+	std %f12, STACK_F12(%r15); \
+	std %f13, STACK_F13(%r15); \
+	std %f14, STACK_F14(%r15); \
+	std %f15, STACK_F15(%r15);
+
+#define END_STACK(last_r) \
+	lg %r1, 0(%r15); \
+	ld %f8, STACK_F8(%r15); \
+	ld %f9, STACK_F9(%r15); \
+	ld %f10, STACK_F10(%r15); \
+	ld %f11, STACK_F11(%r15); \
+	ld %f12, STACK_F12(%r15); \
+	ld %f13, STACK_F13(%r15); \
+	ld %f14, STACK_F14(%r15); \
+	ld %f15, STACK_F15(%r15); \
+	lmg %r6, last_r, 6 * 8(%r1); \
+	lgr %r15, %r1; \
+	cfi_def_cfa_register(DW_REGNO_SP);
+
+#define PLUS(dst,src) \
+	vaf dst, dst, src;
+
+#define XOR(dst,src) \
+	vx dst, dst, src;
+
+#define ROTATE(v1,c) \
+	verllf v1, v1, (c)(0);
+
+#define WORD_ROTATE(v1,s) \
+	vsldb v1, v1, v1, ((s) * 4);
+
+#define DST_8(OPER, I, J) \
+	OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \
+	OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J);
+
+/**********************************************************************
+  round macros
+ **********************************************************************/
+
+/**********************************************************************
+  8-way chacha20 ("vertical")
+ **********************************************************************/
+
+#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
+			      x8,x9,x10,x11,x12,x13,x14,x15,\
+			      y0,y1,y2,y3,y4,y5,y6,y7,\
+			      y8,y9,y10,y11,y12,y13,y14,y15,\
+			      op1,op2,op3,op4,op5,op6,op7,op8,\
+			      op9,op10,op11,op12) \
+	op1;							\
+	PLUS(x0, x1); PLUS(x4, x5);				\
+	PLUS(x8, x9); PLUS(x12, x13);				\
+	PLUS(y0, y1); PLUS(y4, y5);				\
+	PLUS(y8, y9); PLUS(y12, y13);				\
+	    op2;						\
+	    XOR(x3, x0);  XOR(x7, x4);				\
+	    XOR(x11, x8); XOR(x15, x12);			\
+	    XOR(y3, y0);  XOR(y7, y4);				\
+	    XOR(y11, y8); XOR(y15, y12);			\
+		op3;						\
+		ROTATE(x3, 16); ROTATE(x7, 16);			\
+		ROTATE(x11, 16); ROTATE(x15, 16);		\
+		ROTATE(y3, 16); ROTATE(y7, 16);			\
+		ROTATE(y11, 16); ROTATE(y15, 16);		\
+	op4;							\
+	PLUS(x2, x3); PLUS(x6, x7);				\
+	PLUS(x10, x11); PLUS(x14, x15);				\
+	PLUS(y2, y3); PLUS(y6, y7);				\
+	PLUS(y10, y11); PLUS(y14, y15);				\
+	    op5;						\
+	    XOR(x1, x2); XOR(x5, x6);				\
+	    XOR(x9, x10); XOR(x13, x14);			\
+	    XOR(y1, y2); XOR(y5, y6);				\
+	    XOR(y9, y10); XOR(y13, y14);			\
+		op6;						\
+		ROTATE(x1,12); ROTATE(x5,12);			\
+		ROTATE(x9,12); ROTATE(x13,12);			\
+		ROTATE(y1,12); ROTATE(y5,12);			\
+		ROTATE(y9,12); ROTATE(y13,12);			\
+	op7;							\
+	PLUS(x0, x1); PLUS(x4, x5);				\
+	PLUS(x8, x9); PLUS(x12, x13);				\
+	PLUS(y0, y1); PLUS(y4, y5);				\
+	PLUS(y8, y9); PLUS(y12, y13);				\
+	    op8;						\
+	    XOR(x3, x0); XOR(x7, x4);				\
+	    XOR(x11, x8); XOR(x15, x12);			\
+	    XOR(y3, y0); XOR(y7, y4);				\
+	    XOR(y11, y8); XOR(y15, y12);			\
+		op9;						\
+		ROTATE(x3,8); ROTATE(x7,8);			\
+		ROTATE(x11,8); ROTATE(x15,8);			\
+		ROTATE(y3,8); ROTATE(y7,8);			\
+		ROTATE(y11,8); ROTATE(y15,8);			\
+	op10;							\
+	PLUS(x2, x3); PLUS(x6, x7);				\
+	PLUS(x10, x11); PLUS(x14, x15);				\
+	PLUS(y2, y3); PLUS(y6, y7);				\
+	PLUS(y10, y11); PLUS(y14, y15);				\
+	    op11;						\
+	    XOR(x1, x2); XOR(x5, x6);				\
+	    XOR(x9, x10); XOR(x13, x14);			\
+	    XOR(y1, y2); XOR(y5, y6);				\
+	    XOR(y9, y10); XOR(y13, y14);			\
+		op12;						\
+		ROTATE(x1,7); ROTATE(x5,7);			\
+		ROTATE(x9,7); ROTATE(x13,7);			\
+		ROTATE(y1,7); ROTATE(y5,7);			\
+		ROTATE(y9,7); ROTATE(y13,7);
+
+#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\
+			 y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \
+	QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\
+			      x8,x9,x10,x11,x12,x13,x14,x15,\
+			      y0,y1,y2,y3,y4,y5,y6,y7,\
+			      y8,y9,y10,y11,y12,y13,y14,y15,\
+			      ,,,,,,,,,,,)
+
+#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \
+	  vmrhf tmp0, v0, v1;					\
+	  vmrhf tmp1, v2, v3;					\
+	  vmrlf tmp2, v0, v1;					\
+	  vmrlf   v3, v2, v3;					\
+	  vmrhf tmpa, va, vb;					\
+	  vmrhf tmpb, vc, vd;					\
+	  vmrlf tmpc, va, vb;					\
+	  vmrlf   vd, vc, vd;					\
+	  vpdi v0, tmp0, tmp1, 0;				\
+	  vpdi v1, tmp0, tmp1, 5;				\
+	  vpdi v2, tmp2,   v3, 0;				\
+	  vpdi v3, tmp2,   v3, 5;				\
+	  vpdi va, tmpa, tmpb, 0;				\
+	  vpdi vb, tmpa, tmpb, 5;				\
+	  vpdi vc, tmpc,   vd, 0;				\
+	  vpdi vd, tmpc,   vd, 5;
+
+.balign 8
+.globl __chacha20_s390x_vx_blocks8
+ENTRY (__chacha20_s390x_vx_blocks8)
+	/* input:
+	 *	%r2: input
+	 *	%r3: dst
+	 *	%r4: src
+	 *	%r5: nblks (multiple of 8)
+	 */
+
+	START_STACK(%r8);
+	lgr NBLKS, %r5;
+
+	larl %r7, .Lconsts;
+
+	/* Load counter. */
+	lg %r8, (12 * 4)(INPUT);
+	rllg %r8, %r8, 32;
+
+.balign 4
+	/* Process eight chacha20 blocks per loop. */
+.Lloop8:
+	vlm Y0, Y3, 0(INPUT);
+
+	slgfi NBLKS, 8;
+	lghi ROUND, (20 / 2);
+
+	/* Construct counter vectors X12/X13 & Y12/Y13. */
+	vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7);
+	vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7);
+	vrepf Y12, Y3, 0;
+	vrepf Y13, Y3, 1;
+	vaccf X5, Y12, X4;
+	vaccf Y5, Y12, Y4;
+	vaf X12, Y12, X4;
+	vaf Y12, Y12, Y4;
+	vaf X13, Y13, X5;
+	vaf Y13, Y13, Y5;
+
+	vrepf X0, Y0, 0;
+	vrepf X1, Y0, 1;
+	vrepf X2, Y0, 2;
+	vrepf X3, Y0, 3;
+	vrepf X4, Y1, 0;
+	vrepf X5, Y1, 1;
+	vrepf X6, Y1, 2;
+	vrepf X7, Y1, 3;
+	vrepf X8, Y2, 0;
+	vrepf X9, Y2, 1;
+	vrepf X10, Y2, 2;
+	vrepf X11, Y2, 3;
+	vrepf X14, Y3, 2;
+	vrepf X15, Y3, 3;
+
+	/* Store counters for blocks 0-7. */
+	vstm X12, X13, (STACK_CTR + 0 * 16)(%r15);
+	vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15);
+
+	vlr Y0, X0;
+	vlr Y1, X1;
+	vlr Y2, X2;
+	vlr Y3, X3;
+	vlr Y4, X4;
+	vlr Y5, X5;
+	vlr Y6, X6;
+	vlr Y7, X7;
+	vlr Y8, X8;
+	vlr Y9, X9;
+	vlr Y10, X10;
+	vlr Y11, X11;
+	vlr Y14, X14;
+	vlr Y15, X15;
+
+	/* Update and store counter. */
+	agfi %r8, 8;
+	rllg %r5, %r8, 32;
+	stg %r5, (12 * 4)(INPUT);
+
+.balign 4
+.Lround2_8:
+	QUARTERROUND4_V8(X0, X4,  X8, X12,   X1, X5,  X9, X13,
+			 X2, X6, X10, X14,   X3, X7, X11, X15,
+			 Y0, Y4,  Y8, Y12,   Y1, Y5,  Y9, Y13,
+			 Y2, Y6, Y10, Y14,   Y3, Y7, Y11, Y15);
+	QUARTERROUND4_V8(X0, X5, X10, X15,   X1, X6, X11, X12,
+			 X2, X7,  X8, X13,   X3, X4,  X9, X14,
+			 Y0, Y5, Y10, Y15,   Y1, Y6, Y11, Y12,
+			 Y2, Y7,  Y8, Y13,   Y3, Y4,  Y9, Y14);
+	brctg ROUND, .Lround2_8;
+
+	/* Store blocks 4-7. */
+	vstm Y0, Y15, STACK_Y0_Y15(%r15);
+
+	/* Load counters for blocks 0-3. */
+	vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15);
+
+	lghi ROUND, 1;
+	j .Lfirst_output_4blks_8;
+
+.balign 4
+.Lsecond_output_4blks_8:
+	/* Load blocks 4-7. */
+	vlm X0, X15, STACK_Y0_Y15(%r15);
+
+	/* Load counters for blocks 4-7. */
+	vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15);
+
+	lghi ROUND, 0;
+
+.balign 4
+	/* Output four chacha20 blocks per loop. */
+.Lfirst_output_4blks_8:
+	vlm Y12, Y15, 0(INPUT);
+	PLUS(X12, Y0);
+	PLUS(X13, Y1);
+	vrepf Y0, Y12, 0;
+	vrepf Y1, Y12, 1;
+	vrepf Y2, Y12, 2;
+	vrepf Y3, Y12, 3;
+	vrepf Y4, Y13, 0;
+	vrepf Y5, Y13, 1;
+	vrepf Y6, Y13, 2;
+	vrepf Y7, Y13, 3;
+	vrepf Y8, Y14, 0;
+	vrepf Y9, Y14, 1;
+	vrepf Y10, Y14, 2;
+	vrepf Y11, Y14, 3;
+	vrepf Y14, Y15, 2;
+	vrepf Y15, Y15, 3;
+	PLUS(X0, Y0);
+	PLUS(X1, Y1);
+	PLUS(X2, Y2);
+	PLUS(X3, Y3);
+	PLUS(X4, Y4);
+	PLUS(X5, Y5);
+	PLUS(X6, Y6);
+	PLUS(X7, Y7);
+	PLUS(X8, Y8);
+	PLUS(X9, Y9);
+	PLUS(X10, Y10);
+	PLUS(X11, Y11);
+	PLUS(X14, Y14);
+	PLUS(X15, Y15);
+
+	vl Y15, (.Lbswap32 - .Lconsts)(%r7);
+	TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7,
+			Y9, Y10, Y11, Y12, Y13, Y14);
+	TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15,
+			Y9, Y10, Y11, Y12, Y13, Y14);
+
+	vlm Y0, Y14, 0(SRC);
+	vperm X0, X0, X0, Y15;
+	vperm X1, X1, X1, Y15;
+	vperm X2, X2, X2, Y15;
+	vperm X3, X3, X3, Y15;
+	vperm X4, X4, X4, Y15;
+	vperm X5, X5, X5, Y15;
+	vperm X6, X6, X6, Y15;
+	vperm X7, X7, X7, Y15;
+	vperm X8, X8, X8, Y15;
+	vperm X9, X9, X9, Y15;
+	vperm X10, X10, X10, Y15;
+	vperm X11, X11, X11, Y15;
+	vperm X12, X12, X12, Y15;
+	vperm X13, X13, X13, Y15;
+	vperm X14, X14, X14, Y15;
+	vperm X15, X15, X15, Y15;
+	vl Y15, (15 * 16)(SRC);
+
+	XOR(Y0, X0);
+	XOR(Y1, X4);
+	XOR(Y2, X8);
+	XOR(Y3, X12);
+	XOR(Y4, X1);
+	XOR(Y5, X5);
+	XOR(Y6, X9);
+	XOR(Y7, X13);
+	XOR(Y8, X2);
+	XOR(Y9, X6);
+	XOR(Y10, X10);
+	XOR(Y11, X14);
+	XOR(Y12, X3);
+	XOR(Y13, X7);
+	XOR(Y14, X11);
+	XOR(Y15, X15);
+	vstm Y0, Y15, 0(DST);
+
+	aghi SRC, 256;
+	aghi DST, 256;
+
+	clgije ROUND, 1, .Lsecond_output_4blks_8;
+
+	clgijhe NBLKS, 8, .Lloop8;
+
+	/* Clear the used vector registers. */
+	DST_8(CLEAR, 0, _);
+	DST_8(CLEAR, 1, _);
+	DST_8(CLEAR, 2, _);
+	DST_8(CLEAR, 3, _);
+
+	/* Clear sensitive data in stack. */
+	vlm Y0, Y15, STACK_Y0_Y15(%r15);
+	vlm Y0, Y3, STACK_CTR(%r15);
+
+	END_STACK(%r8);
+	xgr %r2, %r2;
+	br %r14;
+END (__chacha20_s390x_vx_blocks8)
+
+#endif /* HAVE_S390_VX_ASM_SUPPORT */
diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h
new file mode 100644
index 0000000000..78252c5488
--- /dev/null
+++ b/sysdeps/s390/s390-64/chacha20_arch.h
@@ -0,0 +1,45 @@
+/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdbool.h>
+#include <ldsodefs.h>
+#include <sys/auxv.h>
+
+unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst,
+					  const uint8_t *src, size_t nblks)
+     attribute_hidden;
+
+static inline void
+chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
+		size_t bytes)
+{
+#ifdef HAVE_S390_VX_ASM_SUPPORT
+  _Static_assert (CHACHA20_BUFSIZE % 8 == 0,
+		  "CHACHA20_BUFSIZE not multiple of 8");
+  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8,
+		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8");
+
+  if (GLRO(dl_hwcap) & HWCAP_S390_VX)
+    {
+      __chacha20_s390x_vx_blocks8 (state, dst, src,
+				   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+      return;
+    }
+#endif
+  chacha20_crypt_generic (state, dst, src, bytes);
+}
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random
  2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
                   ` (7 preceding siblings ...)
  2022-04-19 21:28 ` [PATCH v3 8/9] s390x: " Adhemerval Zanella
@ 2022-04-19 21:28 ` Adhemerval Zanella
  2022-04-22 16:02   ` Yann Droneaud
  8 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-19 21:28 UTC (permalink / raw)
  To: libc-alpha

The arc4random state is moved to TCB, so there is no allocation
failure.  It adds about 592 bytes struct pthread.

Now that the state is thread private within a shared struct, the
 MADV_WIPEONFORK usage is removed.  The cipher state reset is done
 solely by the atfork internal handler.

The state is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

On x86_64 using AVX2 it shows a slight better performance:

From
--------------------------------------------------
arc4random [single-thread]               809.53
arc4random_buf(16) [single-thread]       1242.56
arc4random_buf(32) [single-thread]       1915.90
arc4random_buf(48) [single-thread]       2230.03
arc4random_buf(64) [single-thread]       2429.68
arc4random_buf(80) [single-thread]       2489.70
arc4random_buf(96) [single-thread]       2598.88
arc4random_buf(112) [single-thread]      2699.93
arc4random_buf(128) [single-thread]      2747.31

To                                       MB/s
--------------------------------------------------
arc4random [single-thread]               941.54
arc4random_buf(16) [single-thread]       1409.39
arc4random_buf(32) [single-thread]       2056.17
arc4random_buf(48) [single-thread]       2367.13
arc4random_buf(64) [single-thread]       2551.44
arc4random_buf(80) [single-thread]       2601.38
arc4random_buf(96) [single-thread]       2710.21
arc4random_buf(112) [single-thread]      2797.86
arc4random_buf(128) [single-thread]      2846.12
--------------------------------------------------

However it shows a large speed up specially on architecture with
most costly atomics.  For instance, on a aarch64 Neoverse N1:

From                                     MB/s
--------------------------------------------------
arc4random [single-thread]               154.98
arc4random_buf(16) [single-thread]       342.63
arc4random_buf(32) [single-thread]       485.91
arc4random_buf(48) [single-thread]       539.95
arc4random_buf(64) [single-thread]       593.38
arc4random_buf(80) [single-thread]       629.45
arc4random_buf(96) [single-thread]       655.78
arc4random_buf(112) [single-thread]      670.54
arc4random_buf(128) [single-thread]      681.65
--------------------------------------------------

To                                       MB/s
--------------------------------------------------
arc4random [single-thread]               335.94
arc4random_buf(16) [single-thread]       498.69
arc4random_buf(32) [single-thread]       612.24
arc4random_buf(48) [single-thread]       655.77
arc4random_buf(64) [single-thread]       691.97
arc4random_buf(80) [single-thread]       701.68
arc4random_buf(96) [single-thread]       710.35
arc4random_buf(112) [single-thread]      714.23
arc4random_buf(128) [single-thread]      722.13
--------------------------------------------------

Checked on x86_64-linux-gnu.
---
 nptl/allocatestack.c                   |   5 +-
 stdlib/arc4random.c                    | 137 +++++++------------------
 stdlib/arc4random.h                    |  45 ++++++++
 stdlib/arc4random_uniform.c            |   8 +-
 stdlib/chacha20.c                      |   3 -
 stdlib/tst-arc4random-chacha20.c       |   2 +-
 sysdeps/generic/tls-internal-struct.h  |   3 +
 sysdeps/unix/sysv/linux/tls-internal.h |  27 ++++-
 8 files changed, 115 insertions(+), 115 deletions(-)
 create mode 100644 stdlib/arc4random.h

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 01a282f3f6..ada65d40c2 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -32,6 +32,7 @@
 #include <kernel-features.h>
 #include <nptl-stack.h>
 #include <libc-lock.h>
+#include <tls-internal.h>
 
 /* Default alignment of stack.  */
 #ifndef STACK_ALIGN
@@ -127,7 +128,7 @@ get_cached_stack (size_t *sizep, void **memp)
 
   result->exiting = false;
   __libc_lock_init (result->exit_lock);
-  result->tls_state = (struct tls_internal_t) { 0 };
+  __glibc_tls_internal_init (&result->tls_state);
 
   /* Clear the DTV.  */
   dtv_t *dtv = GET_DTV (TLS_TPADJ (result));
@@ -559,6 +560,8 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 #endif
   pd->robust_head.list = &pd->robust_head;
 
+  __glibc_tls_internal_init (&pd->tls_state);
+
   /* We place the thread descriptor at the end of the stack.  */
   *pdp = pd;
 
diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
index cddb0e405a..6144275c08 100644
--- a/stdlib/arc4random.c
+++ b/stdlib/arc4random.c
@@ -16,14 +16,15 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <arc4random.h>
 #include <errno.h>
-#include <libc-lock.h>
 #include <not-cancel.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/mman.h>
 #include <sys/param.h>
 #include <sys/random.h>
+#include <tls-internal.h>
 
 /* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
    current valid bytes not yet consumed in 'buf', while 'count' is the maximum
@@ -37,42 +38,16 @@
    arc4random calls (since only multiple call it will encrypt the next block).
  */
 
-/* Maximum number bytes until reseed (16 MB).  */
-#define CHACHE_RESEED_SIZE	(16 * 1024 * 1024)
-/* Internal buffer size in bytes (1KB).  */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
-
 #include <chacha20.c>
 
-static struct arc4random_state
-{
-  uint32_t ctx[CHACHA20_STATE_LEN];
-  size_t have;
-  size_t count;
-  uint8_t buf[CHACHA20_BUFSIZE];
-} *state;
-
-/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus
-   it does not require to clear the internal state.  */
-static bool __arc4random_wipeonfork = false;
-
-__libc_lock_define_initialized (, __arc4random_lock);
-
-/* Called from the fork function to reset the state if MADV_WIPEONFORK is
-   not supported and to reinit the internal lock.  */
+/* Called from the fork function to reset the state.  */
 void
 __arc4random_fork_subprocess (void)
 {
-  if (__arc4random_wipeonfork && state != NULL)
-    memset (state, 0, sizeof (struct arc4random_state));
-
-  __libc_lock_init (__arc4random_lock);
-}
-
-static void
-arc4random_allocate_failure (void)
-{
-  __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n");
+  struct arc4random_state *state = &__glibc_tls_internal()->rnd_state;
+  memset (state, 0, sizeof (struct arc4random_state));
+  /* Force key init.  */
+  state->count = -1;
 }
 
 static void
@@ -81,33 +56,10 @@ arc4random_getrandom_failure (void)
   __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n");
 }
 
-/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
-   the fork callback will reset the state on the fork call.  It does not
-   handle direct clone calls, nor vfork or _Fork (arc4random is not
-   async-signal-safe due the internal lock usage).  */
-static void
-arc4random_init (uint8_t *buf, size_t len)
-{
-  state = __mmap (NULL, sizeof (struct arc4random_state),
-		  PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
-  if (state == MAP_FAILED)
-    arc4random_allocate_failure ();
-
-#ifdef MADV_WIPEONFORK
-  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
-  if (r == 0)
-    __arc4random_wipeonfork = true;
-  else if (errno != EINVAL)
-    arc4random_allocate_failure ();
-#endif
-
-  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
-}
-
 #define min(x,y) (((x) > (y)) ? (y) : (x))
 
 static void
-arc4random_rekey (uint8_t *rnd, size_t rndlen)
+arc4random_rekey (struct arc4random_state *state, uint8_t *rnd, size_t rndlen)
 {
   memset (state->buf, 0, sizeof state->buf);
   chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
@@ -152,41 +104,41 @@ arc4random_getentropy (uint8_t *rnd, size_t len)
   arc4random_getrandom_failure ();
 }
 
-/* Either allocates the state buffer or reinit it by reseeding the cipher
-   state with kernel entropy.  */
-static void
-arc4random_stir (void)
+/* Reinit the thread context by reseeding the cipher state with kernel
+   entropy.  */
+static struct arc4random_state *
+arc4random_check_stir (size_t len)
 {
-  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
-  arc4random_getentropy (rnd, sizeof rnd);
+  struct arc4random_state *state = &__glibc_tls_internal()->rnd_state;
 
-  if (state == NULL)
-    arc4random_init (rnd, sizeof rnd);
-  else
-    arc4random_rekey (rnd, sizeof rnd);
+  if (state->count < len || state->count == -1)
+    {
+      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
+      arc4random_getentropy (rnd, sizeof rnd);
 
-  explicit_bzero (rnd, sizeof rnd);
+      if (state->count > CHACHE_RESEED_SIZE)
+	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
+      else
+	arc4random_rekey (state, rnd, sizeof rnd);
 
-  state->have = 0;
-  memset (state->buf, 0, sizeof state->buf);
-  state->count = CHACHE_RESEED_SIZE;
-}
+      explicit_bzero (rnd, sizeof rnd);
 
-static void
-arc4random_check_stir (size_t len)
-{
-  if (state == NULL || state->count < len)
-    arc4random_stir ();
+      state->have = 0;
+      memset (state->buf, 0, sizeof state->buf);
+      state->count = CHACHE_RESEED_SIZE;
+    }
   if (state->count <= len)
     state->count = 0;
   else
     state->count -= len;
+
+  return state;
 }
 
 void
-__arc4random_buf_internal (void *buffer, size_t len)
+__arc4random_buf (void *buffer, size_t len)
 {
-  arc4random_check_stir (len);
+  struct arc4random_state *state = arc4random_check_stir (len);
 
   while (len > 0)
     {
@@ -201,29 +153,20 @@ __arc4random_buf_internal (void *buffer, size_t len)
 	  state->have -= m;
 	}
       if (state->have == 0)
-	arc4random_rekey (NULL, 0);
+	arc4random_rekey (state, NULL, 0);
     }
 }
-
-void
-__arc4random_buf (void *buffer, size_t len)
-{
-  __libc_lock_lock (__arc4random_lock);
-  __arc4random_buf_internal (buffer, len);
-  __libc_lock_unlock (__arc4random_lock);
-}
 libc_hidden_def (__arc4random_buf)
 weak_alias (__arc4random_buf, arc4random_buf)
 
-
-static uint32_t
-__arc4random_internal (void)
+uint32_t
+__arc4random (void)
 {
   uint32_t r;
 
-  arc4random_check_stir (sizeof (uint32_t));
+  struct arc4random_state *state = arc4random_check_stir (sizeof (uint32_t));
   if (state->have < sizeof (uint32_t))
-    arc4random_rekey (NULL, 0);
+    arc4random_rekey (state, NULL, 0);
   uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
   memcpy (&r, ks, sizeof (uint32_t));
   memset (ks, 0, sizeof (uint32_t));
@@ -231,15 +174,5 @@ __arc4random_internal (void)
 
   return r;
 }
-
-uint32_t
-__arc4random (void)
-{
-  uint32_t r;
-  __libc_lock_lock (__arc4random_lock);
-  r = __arc4random_internal ();
-  __libc_lock_unlock (__arc4random_lock);
-  return r;
-}
 libc_hidden_def (__arc4random)
 weak_alias (__arc4random, arc4random)
diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
new file mode 100644
index 0000000000..40672299d0
--- /dev/null
+++ b/stdlib/arc4random.h
@@ -0,0 +1,45 @@
+/* Arc4random definition used on TLS.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _CHACHA20_H
+#define _CHACHA20_H
+
+#include <stddef.h>
+#include <stdint.h>
+
+/* Internal ChaCha20 state.  */
+#define CHACHA20_STATE_LEN	16
+#define CHACHA20_BLOCK_SIZE	64
+
+/* Maximum number bytes until reseed (16 MB).  */
+#define CHACHE_RESEED_SIZE	(16 * 1024 * 1024)
+
+/* Internal arc4random buffer, used on each feedback step so offer some
+   backtracking protection and to allow better used of vectorized
+   chacha20 implementations.  */
+#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
+
+struct arc4random_state
+{
+  uint32_t ctx[CHACHA20_STATE_LEN];
+  size_t have;
+  size_t count;
+  uint8_t buf[CHACHA20_BUFSIZE];
+};
+
+#endif
diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
index 96ffe62df1..7d0140c375 100644
--- a/stdlib/arc4random_uniform.c
+++ b/stdlib/arc4random_uniform.c
@@ -46,7 +46,7 @@ random_bytes (uint32_t *result, uint32_t byte_count)
   unsigned char *ptr = (unsigned char *) result;
   if (__BYTE_ORDER == __BIG_ENDIAN)
     ptr += 4 - byte_count;
-  __arc4random_buf_internal (ptr, byte_count);
+  __arc4random_buf (ptr, byte_count);
 }
 
 static uint32_t
@@ -142,11 +142,7 @@ __libc_lock_define (extern , __arc4random_lock attribute_hidden)
 uint32_t
 __arc4random_uniform (uint32_t upper_bound)
 {
-  uint32_t r;
-  __libc_lock_lock (__arc4random_lock);
-  r = compute_uniform (upper_bound);
-  __libc_lock_unlock (__arc4random_lock);
-  return r;
+  return compute_uniform (upper_bound);
 }
 libc_hidden_def (__arc4random_uniform)
 weak_alias (__arc4random_uniform, arc4random_uniform)
diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
index fea4994169..0fb55c0fa3 100644
--- a/stdlib/chacha20.c
+++ b/stdlib/chacha20.c
@@ -26,11 +26,8 @@
 #define CHACHA20_IV_SIZE	16
 #define CHACHA20_KEY_SIZE	32
 
-#define CHACHA20_BLOCK_SIZE     64
 #define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
 
-#define CHACHA20_STATE_LEN	16
-
 /* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
    Sby RFC8439.  Since the input stream will either zero bytes (initial state)
    or the PRNG output itself this step does not add any extra entropy.   */
diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
index dd0ef6d8ba..614e6e0736 100644
--- a/stdlib/tst-arc4random-chacha20.c
+++ b/stdlib/tst-arc4random-chacha20.c
@@ -16,11 +16,11 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <arc4random.h>
 #include <support/check.h>
 #include <sys/cdefs.h>
 
 /* It does not define CHACHA20_XOR_FINAL to check what glibc actual uses. */
-#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
 #include <chacha20.c>
 
 static int
diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
index d76c715a96..5d0e2fba53 100644
--- a/sysdeps/generic/tls-internal-struct.h
+++ b/sysdeps/generic/tls-internal-struct.h
@@ -19,10 +19,13 @@
 #ifndef _TLS_INTERNAL_STRUCT_H
 #define _TLS_INTERNAL_STRUCT_H 1
 
+#include <stdlib/arc4random.h>
+
 struct tls_internal_t
 {
   char *strsignal_buf;
   char *strerror_l_buf;
+  struct arc4random_state rnd_state;
 };
 
 #endif
diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
index f7a1a62135..16ff836d05 100644
--- a/sysdeps/unix/sysv/linux/tls-internal.h
+++ b/sysdeps/unix/sysv/linux/tls-internal.h
@@ -22,6 +22,19 @@
 #include <stdlib.h>
 #include <pthreadP.h>
 
+static inline void
+__glibc_tls_internal_init (struct tls_internal_t *tls_state)
+{
+  tls_state->strsignal_buf = NULL;
+  tls_state->strerror_l_buf = NULL;
+
+  /* Force key init on created threads.  There is no need to clear the
+     initial state since it will be done either by allocation a new
+     stack (through mmap with MAP_ANONYMOUS) or by the free function
+     below).  */
+  tls_state->rnd_state.count = -1;
+}
+
 static inline struct tls_internal_t *
 __glibc_tls_internal (void)
 {
@@ -31,8 +44,18 @@ __glibc_tls_internal (void)
 static inline void
 __glibc_tls_internal_free (void)
 {
-  free (THREAD_SELF->tls_state.strsignal_buf);
-  free (THREAD_SELF->tls_state.strerror_l_buf);
+  struct pthread *self = THREAD_SELF;
+  free (self->tls_state.strsignal_buf);
+  free (self->tls_state.strerror_l_buf);
+  if (self->tls_state.rnd_state.count != -1)
+    {
+      /* Clear any lingering random state prior so if the thread stack
+	 is cached it won't leak any data.  */
+      memset (&self->tls_state.rnd_state, 0,
+	      sizeof self->tls_state.rnd_state);
+      /* Force key init on created threads.  */
+      self->tls_state.rnd_state.count = -1;
+    }
 }
 
 #endif
-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
@ 2022-04-19 21:52   ` H.J. Lu
  2022-04-20 12:38     ` Adhemerval Zanella
  2022-04-22 13:54   ` Yann Droneaud
  2022-04-25  2:22   ` Mark Harris
  2 siblings, 1 reply; 22+ messages in thread
From: H.J. Lu @ 2022-04-19 21:52 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: GNU C Library, Florian Weimer

On Tue, Apr 19, 2022 at 2:29 PM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> The implementation is based on scalar Chacha20, with global cache and
> locking.  It uses getrandom or /dev/urandom as fallback to get the
> initial entropy, and reseeds the internal state on every 16MB of
> consumed buffer.
>
> It maintains an internal buffer which consumes at maximum one page on
> most systems (assuming minimum of 4k pages).  The internal buf optimizes
> the cipher encrypt calls, by amortize arc4random calls (where both
> function call and locks cost are the dominating factor).
>
> The ChaCha20 implementation is based on the RFC8439 [1], with last
> step that XOR with the input omited.  Since the input stream will either
> zero bytes (initial state) or the PRNG output itself this step does not
> add any extra entropy.
>
> The arc4random_uniform is based on previous work by Florian Weimer.
>
> Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
>
> Co-authored-by: Florian Weimer <fweimer@redhat.com>
>
> [1] https://datatracker.ietf.org/doc/html/rfc8439
> ---
>  NEWS                                          |   4 +-
>  include/stdlib.h                              |  13 +
>  posix/fork.c                                  |   2 +
>  stdlib/Makefile                               |   2 +
>  stdlib/Versions                               |   5 +
>  stdlib/arc4random.c                           | 245 ++++++++++++++++++
>  stdlib/arc4random_uniform.c                   | 152 +++++++++++
>  stdlib/chacha20.c                             | 163 ++++++++++++
>  stdlib/stdlib.h                               |  14 +
>  sysdeps/generic/not-cancel.h                  |   2 +
>  sysdeps/mach/hurd/i386/libc.abilist           |   3 +
>  sysdeps/mach/hurd/not-cancel.h                |   3 +
>  sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   3 +
>  sysdeps/unix/sysv/linux/alpha/libc.abilist    |   3 +
>  sysdeps/unix/sysv/linux/arc/libc.abilist      |   3 +
>  sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   3 +
>  sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   3 +
>  sysdeps/unix/sysv/linux/csky/libc.abilist     |   3 +
>  sysdeps/unix/sysv/linux/hppa/libc.abilist     |   3 +
>  sysdeps/unix/sysv/linux/i386/libc.abilist     |   3 +
>  sysdeps/unix/sysv/linux/ia64/libc.abilist     |   3 +
>  .../sysv/linux/m68k/coldfire/libc.abilist     |   3 +
>  .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   3 +
>  .../sysv/linux/microblaze/be/libc.abilist     |   3 +
>  .../sysv/linux/microblaze/le/libc.abilist     |   3 +
>  .../sysv/linux/mips/mips32/fpu/libc.abilist   |   3 +
>  .../sysv/linux/mips/mips32/nofpu/libc.abilist |   3 +
>  .../sysv/linux/mips/mips64/n32/libc.abilist   |   3 +
>  .../sysv/linux/mips/mips64/n64/libc.abilist   |   3 +
>  sysdeps/unix/sysv/linux/nios2/libc.abilist    |   3 +
>  sysdeps/unix/sysv/linux/not-cancel.h          |   7 +
>  sysdeps/unix/sysv/linux/or1k/libc.abilist     |   3 +
>  .../linux/powerpc/powerpc32/fpu/libc.abilist  |   3 +
>  .../powerpc/powerpc32/nofpu/libc.abilist      |   3 +
>  .../linux/powerpc/powerpc64/be/libc.abilist   |   3 +
>  .../linux/powerpc/powerpc64/le/libc.abilist   |   3 +
>  .../unix/sysv/linux/riscv/rv32/libc.abilist   |   3 +
>  .../unix/sysv/linux/riscv/rv64/libc.abilist   |   3 +
>  .../unix/sysv/linux/s390/s390-32/libc.abilist |   3 +
>  .../unix/sysv/linux/s390/s390-64/libc.abilist |   3 +
>  sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   3 +
>  sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   3 +
>  .../sysv/linux/sparc/sparc32/libc.abilist     |   3 +
>  .../sysv/linux/sparc/sparc64/libc.abilist     |   3 +
>  .../unix/sysv/linux/x86_64/64/libc.abilist    |   3 +
>  .../unix/sysv/linux/x86_64/x32/libc.abilist   |   3 +
>  46 files changed, 713 insertions(+), 1 deletion(-)
>  create mode 100644 stdlib/arc4random.c
>  create mode 100644 stdlib/arc4random_uniform.c
>  create mode 100644 stdlib/chacha20.c
>
> diff --git a/NEWS b/NEWS
> index 4b6d9de2b5..4d9d95b35b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,9 @@ Version 2.36
>
>  Major new features:
>
> -  [Add new features here]
> +* The functions arc4random, arc4random_buf, arc4random_uniform have been
> +  added.  The functions use a cryptographic pseudo-random number generator
> +  based on ChaCha20 initilized with entropy from kernel.
                                         ^^^^^^^^ Typo.
>
>  Deprecated and removed features, and other changes affecting compatibility:
>
> diff --git a/include/stdlib.h b/include/stdlib.h
> index 1c6f70b082..055f9d2965 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -144,6 +144,19 @@ libc_hidden_proto (__ptsname_r)
>  libc_hidden_proto (grantpt)
>  libc_hidden_proto (unlockpt)
>
> +__typeof (arc4random) __arc4random;
> +libc_hidden_proto (__arc4random);
> +__typeof (arc4random_buf) __arc4random_buf;
> +libc_hidden_proto (__arc4random_buf);
> +__typeof (arc4random_uniform) __arc4random_uniform;
> +libc_hidden_proto (__arc4random_uniform);
> +extern void __arc4random_buf_internal (void *buffer, size_t len)
> +     attribute_hidden;
> +/* Called from the fork function to reinitialize the internal lock in thte
> +   child process.  This avoids deadlocks if fork is called in multi-threaded
> +   processes.  */
> +extern void __arc4random_fork_subprocess (void) attribute_hidden;
> +
>  extern double __strtod_internal (const char *__restrict __nptr,
>                                  char **__restrict __endptr, int __group)
>       __THROW __nonnull ((1)) __wur;
> diff --git a/posix/fork.c b/posix/fork.c
> index 6b50c091f9..87d8329b46 100644
> --- a/posix/fork.c
> +++ b/posix/fork.c
> @@ -96,6 +96,8 @@ __libc_fork (void)
>                                      &nss_database_data);
>         }
>
> +      call_function_static_weak (__arc4random_fork_subprocess);
> +
>        /* Reset the lock the dynamic loader uses to protect its data.  */
>        __rtld_lock_initialize (GL(dl_load_lock));
>
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index 60fc59c12c..9f9cc1bd7f 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -53,6 +53,8 @@ routines := \
>    a64l \
>    abort \
>    abs \
> +  arc4random \
> +  arc4random_uniform \
>    at_quick_exit \
>    atof \
>    atoi \
> diff --git a/stdlib/Versions b/stdlib/Versions
> index 5e9099a153..d09a308fb5 100644
> --- a/stdlib/Versions
> +++ b/stdlib/Versions
> @@ -136,6 +136,11 @@ libc {
>      strtof32; strtof64; strtof32x;
>      strtof32_l; strtof64_l; strtof32x_l;
>    }
> +  GLIBC_2.36 {
> +    arc4random;
> +    arc4random_buf;
> +    arc4random_uniform;
> +  }
>    GLIBC_PRIVATE {
>      # functions which have an additional interface since they are
>      # are cancelable.
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> new file mode 100644
> index 0000000000..cddb0e405a
> --- /dev/null
> +++ b/stdlib/arc4random.c
> @@ -0,0 +1,245 @@
> +/* Pseudo Random Number Generator based on ChaCha20.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <libc-lock.h>
> +#include <not-cancel.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/mman.h>
> +#include <sys/param.h>
> +#include <sys/random.h>
> +
> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
> +   number of bytes until a reseed.
> +
> +   Both the initial seed an reseed tries to obtain entropy from the kernel
                                         ^^^^^^^^^^^^^^^^ Typo?
> +   and abort the process if none could be obtained.
> +
> +   The state 'buf' improves the usage of the cipher call, allowing to call
> +   optimized implementations (if the archictecture provides it) and optimize
                                                              ^^^^^^^^^^^^ Typo?
> +   arc4random calls (since only multiple call it will encrypt the next block).
> + */
> +
> +/* Maximum number bytes until reseed (16 MB).  */
> +#define CHACHE_RESEED_SIZE     (16 * 1024 * 1024)
> +/* Internal buffer size in bytes (1KB).  */
> +#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> +
> +#include <chacha20.c>
> +
> +static struct arc4random_state
> +{
> +  uint32_t ctx[CHACHA20_STATE_LEN];
> +  size_t have;
> +  size_t count;
> +  uint8_t buf[CHACHA20_BUFSIZE];
> +} *state;
> +
> +/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus
> +   it does not require to clear the internal state.  */
> +static bool __arc4random_wipeonfork = false;
> +
> +__libc_lock_define_initialized (, __arc4random_lock);
> +
> +/* Called from the fork function to reset the state if MADV_WIPEONFORK is
> +   not supported and to reinit the internal lock.  */
> +void
> +__arc4random_fork_subprocess (void)
> +{
> +  if (__arc4random_wipeonfork && state != NULL)

Should it be !__arc4random_wipeonfork?

> +    memset (state, 0, sizeof (struct arc4random_state));
> +
> +  __libc_lock_init (__arc4random_lock);
> +}
> +
> +static void
> +arc4random_allocate_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n");
> +}
> +
> +static void
> +arc4random_getrandom_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n");
> +}
> +
> +/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
> +   the fork callback will reset the state on the fork call.  It does not
> +   handle direct clone calls, nor vfork or _Fork (arc4random is not
> +   async-signal-safe due the internal lock usage).  */
> +static void
> +arc4random_init (uint8_t *buf, size_t len)
> +{
> +  state = __mmap (NULL, sizeof (struct arc4random_state),
> +                 PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +  if (state == MAP_FAILED)
> +    arc4random_allocate_failure ();
> +
> +#ifdef MADV_WIPEONFORK
> +  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
> +  if (r == 0)
> +    __arc4random_wipeonfork = true;
> +  else if (errno != EINVAL)
> +    arc4random_allocate_failure ();
> +#endif
> +
> +  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
> +}
> +
> +#define min(x,y) (((x) > (y)) ? (y) : (x))
> +
> +static void
> +arc4random_rekey (uint8_t *rnd, size_t rndlen)
> +{
> +  memset (state->buf, 0, sizeof state->buf);
> +  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> +
> +  /* Mix some extra entropy if provided.  */
> +  if (rnd != NULL)
> +    {
> +      size_t m = min (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +      for (size_t i = 0; i < m; i++)
> +       state->buf[i] ^= rnd[i];
> +    }
> +
> +  /* Immediately reinit for backtracking resistance.  */
> +  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> +  memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +}
> +
> +static void
> +arc4random_getentropy (uint8_t *rnd, size_t len)
> +{
> +  if (__getrandomn_nocancel (rnd, len, GRND_NONBLOCK) == len)
> +    return;
> +
> +  int fd = __open64_nocancel ("/dev/urandom", O_RDONLY);
> +  if (fd != -1)
> +    {
> +      unsigned char *p = rnd;
> +      unsigned char *end = p + len;
> +      do
> +       {
> +         ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> +         if (ret <= 0)
> +           arc4random_getrandom_failure ();
> +         p += ret;
> +       }
> +      while (p < end);
> +
> +      if (__close_nocancel (fd) != 0)
> +       return;
> +    }
> +  arc4random_getrandom_failure ();
> +}
> +
> +/* Either allocates the state buffer or reinit it by reseeding the cipher
> +   state with kernel entropy.  */
> +static void
> +arc4random_stir (void)
> +{
> +  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> +  arc4random_getentropy (rnd, sizeof rnd);
> +
> +  if (state == NULL)
> +    arc4random_init (rnd, sizeof rnd);
> +  else
> +    arc4random_rekey (rnd, sizeof rnd);
> +
> +  explicit_bzero (rnd, sizeof rnd);
> +
> +  state->have = 0;
> +  memset (state->buf, 0, sizeof state->buf);
> +  state->count = CHACHE_RESEED_SIZE;
> +}
> +
> +static void
> +arc4random_check_stir (size_t len)
> +{
> +  if (state == NULL || state->count < len)
> +    arc4random_stir ();
> +  if (state->count <= len)
> +    state->count = 0;
> +  else
> +    state->count -= len;
> +}
> +
> +void
> +__arc4random_buf_internal (void *buffer, size_t len)
> +{
> +  arc4random_check_stir (len);
> +
> +  while (len > 0)
> +    {
> +      if (state->have > 0)
> +       {
> +         size_t m = min (len, state->have);
> +         uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +         memcpy (buffer, ks, m);
> +         memset (ks, 0, m);
> +         buffer += m;
> +         len -= m;
> +         state->have -= m;
> +       }
> +      if (state->have == 0)
> +       arc4random_rekey (NULL, 0);
> +    }
> +}
> +
> +void
> +__arc4random_buf (void *buffer, size_t len)
> +{
> +  __libc_lock_lock (__arc4random_lock);
> +  __arc4random_buf_internal (buffer, len);
> +  __libc_lock_unlock (__arc4random_lock);
> +}
> +libc_hidden_def (__arc4random_buf)
> +weak_alias (__arc4random_buf, arc4random_buf)
> +
> +
> +static uint32_t
> +__arc4random_internal (void)
> +{
> +  uint32_t r;
> +
> +  arc4random_check_stir (sizeof (uint32_t));
> +  if (state->have < sizeof (uint32_t))
> +    arc4random_rekey (NULL, 0);
> +  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +  memcpy (&r, ks, sizeof (uint32_t));
> +  memset (ks, 0, sizeof (uint32_t));
> +  state->have -= sizeof (uint32_t);
> +
> +  return r;
> +}
> +
> +uint32_t
> +__arc4random (void)
> +{
> +  uint32_t r;
> +  __libc_lock_lock (__arc4random_lock);
> +  r = __arc4random_internal ();
> +  __libc_lock_unlock (__arc4random_lock);
> +  return r;
> +}
> +libc_hidden_def (__arc4random)
> +weak_alias (__arc4random, arc4random)
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> new file mode 100644
> index 0000000000..96ffe62df1
> --- /dev/null
> +++ b/stdlib/arc4random_uniform.c
> @@ -0,0 +1,152 @@
> +/* Random pseudo generator numbers between 0 and 2**-31 (inclusive)
> +   uniformly distributed but with an upper_bound.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <endian.h>
> +#include <libc-lock.h>
> +#include <stdlib.h>
> +#include <sys/param.h>
> +
> +/* Return the number of bytes which cover values up to the limit.  */
> +__attribute__ ((const))
> +static uint32_t
> +byte_count (uint32_t n)
> +{
> +  if (n <= (1U << 8))
> +    return 1;
> +  else if (n <= (1U << 16))
> +    return 2;
> +  else if (n <= (1U << 24))
> +    return 3;
> +  else
> +    return 4;
> +}
> +
> +/* Fill the lower bits of the result with randomness, according to the
> +   number of bytes requested.  */
> +static void
> +random_bytes (uint32_t *result, uint32_t byte_count)
> +{
> +  *result = 0;
> +  unsigned char *ptr = (unsigned char *) result;
> +  if (__BYTE_ORDER == __BIG_ENDIAN)
> +    ptr += 4 - byte_count;
> +  __arc4random_buf_internal (ptr, byte_count);
> +}
> +
> +static uint32_t
> +compute_uniform (uint32_t n)
> +{
> +  if (n <= 1)
> +    /* There is no valid return value for a zero limit, and 0 is the
> +       only possible result for limit 1.  */
> +    return 0;
> +
> +  /* The bits variable serves as a source for bits.  Prefetch the
> +     minimum number of bytes needed.  */
> +  unsigned count = byte_count (n);
> +  uint32_t bits_length = count * CHAR_BIT;
> +  uint32_t bits;
> +  random_bytes (&bits, count);
> +
> +  /* Powers of two are easy.  */
> +  if (powerof2 (n))
> +    return bits & (n - 1);
> +
> +  /* The general case.  This algorithm follows Jérémie Lumbroso,
> +     Optimal Discrete Uniform Generation from Coin Flips, and
> +     Applications (2013), who credits Donald E. Knuth and Andrew
> +     C. Yao, The complexity of nonuniform random number generation
> +     (1976), for solving the general case.
> +
> +     The implementation below unrolls the initialization stage of the
> +     loop, where v is less than n.  */
> +
> +  /* Use 64-bit variables even though the intermediate results are
> +     never larger that 33 bits.  This ensures the code easier to
                               than
> +     compile on 64-bit architectures.  */
> +  uint64_t v;
> +  uint64_t c;
> +
> +  /* Initialize v and c.  v is the smallest power of 2 which is larger
> +     than n.*/
> +  {
> +    uint32_t log2p1 = 32 - __builtin_clz (n);
> +    v = 1ULL << log2p1;
> +    c = bits & (v - 1);
> +    bits >>= log2p1;
> +    bits_length -= log2p1;
> +  }
> +
> +  /* At the start of the loop, c is uniformly distributed within the
> +     half-open interval [0, v), and v < 2n < 2**33.  */
> +  while (true)
> +    {
> +      if (v >= n)
> +        {
> +          /* If the candidate is less than n, accept it.  */
> +          if (c < n)
> +            /* c is uniformly distributed on [0, n).  */
> +            return c;
> +          else
> +            {
> +              /* c is uniformly distributed on [n, v).  */
> +              v -= n;
> +              c -= n;
> +              /* The distribution was shifted, so c is uniformly
> +                 distributed on [0, v) again.  */
> +            }
> +        }
> +      /* v < n here.  */
> +
> +      /* Replenish the bit source if necessary.  */
> +      if (bits_length == 0)
> +        {
> +          /* Overwrite the least significant byte.  */
> +         random_bytes (&bits, 1);
> +         bits_length = CHAR_BIT;
> +        }
> +
> +      /* Double the range.  No overflow because v < n < 2**32.  */
> +      v *= 2;
> +      /* v < 2n here.  */
> +
> +      /* Extract a bit and append it to c.  c remains less than v and
> +         thus 2**33.  */
> +      c = (c << 1) | (bits & 1);
> +      bits >>= 1;
> +      --bits_length;
> +
> +      /* At this point, c is uniformly distributed on [0, v) again,
> +         and v < 2n < 2**33.  */
> +    }
> +}
> +
> +__libc_lock_define (extern , __arc4random_lock attribute_hidden)
> +
> +uint32_t
> +__arc4random_uniform (uint32_t upper_bound)
> +{
> +  uint32_t r;
> +  __libc_lock_lock (__arc4random_lock);
> +  r = compute_uniform (upper_bound);
> +  __libc_lock_unlock (__arc4random_lock);
> +  return r;
> +}
> +libc_hidden_def (__arc4random_uniform)
> +weak_alias (__arc4random_uniform, arc4random_uniform)
> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> new file mode 100644
> index 0000000000..af4ffa9860
> --- /dev/null
> +++ b/stdlib/chacha20.c
> @@ -0,0 +1,163 @@
> +/* Generic ChaCha20 implementation (used on arc4random).
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <array_length.h>
> +#include <endian.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <string.h>
> +
> +/* 32-bit stream position, then 96-bit nonce.  */
> +#define CHACHA20_IV_SIZE       16
> +#define CHACHA20_KEY_SIZE      32
> +
> +#define CHACHA20_BLOCK_SIZE     64
> +#define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
> +
> +#define CHACHA20_STATE_LEN     16
> +
> +/* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
> +   Sby RFC8439.  Since the input stream will either zero bytes (initial state)
         by
> +   or the PRNG output itself this step does not add any extra entropy.   */
> +
> +enum chacha20_constants
> +{
> +  CHACHA20_CONSTANT_EXPA = 0x61707865U,
> +  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
> +  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
> +  CHACHA20_CONSTANT_TE_K = 0x6b206574U
> +};
> +
> +static inline uint32_t
> +read_unaligned_32 (const uint8_t *p)
> +{
> +  uint32_t r;
> +  memcpy (&r, p, sizeof (r));
> +  return r;
> +}
> +
> +static inline void
> +write_unaligned_32 (uint8_t *p, uint32_t v)
> +{
> +  memcpy (p, &v, sizeof (v));
> +}
> +
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
> +# define set_state(v)          __builtin_bswap32 ((v))
> +#else
> +# define read_unaligned_le32(p) read_unaligned_32 ((p))
> +# define set_state(v)          (v)
> +#endif
> +
> +static inline void
> +chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
> +{
> +  state[0]  = CHACHA20_CONSTANT_EXPA;
> +  state[1]  = CHACHA20_CONSTANT_ND_3;
> +  state[2]  = CHACHA20_CONSTANT_2_BY;
> +  state[3]  = CHACHA20_CONSTANT_TE_K;
> +
> +  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
> +  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
> +  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
> +  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
> +  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
> +  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
> +  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
> +  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
> +
> +  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
> +  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
> +  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
> +  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
> +}
> +
> +static inline uint32_t
> +rotl32 (unsigned int shift, uint32_t word)
> +{
> +  return (word << (shift & 31)) | (word >> ((-shift) & 31));
> +}
> +
> +#define QROUND(x0, x1, x2, x3)                         \
> +  do {                                         \
> +   x0 = x0 + x1; x3 = rotl32 (16, (x0 ^ x3));  \
> +   x2 = x2 + x3; x1 = rotl32 (12, (x1 ^ x2));  \
> +   x0 = x0 + x1; x3 = rotl32 (8,  (x0 ^ x3));  \
> +   x2 = x2 + x3; x1 = rotl32 (7,  (x1 ^ x2));  \
> +  } while(0)
> +
> +static inline void
> +chacha20_block (uint32_t *state, uint32_t *stream)
> +{
> +  uint32_t x[CHACHA20_STATE_LEN];
> +  memcpy (x, state, sizeof x);
> +
> +  for (int i = 0; i < 20; i += 2)
> +    {
> +      QROUND (x[0], x[4], x[8],  x[12]);
> +      QROUND (x[1], x[5], x[9],  x[13]);
> +      QROUND (x[2], x[6], x[10], x[14]);
> +      QROUND (x[3], x[7], x[11], x[15]);
> +
> +      QROUND (x[0], x[5], x[10], x[15]);
> +      QROUND (x[1], x[6], x[11], x[12]);
> +      QROUND (x[2], x[7], x[8],  x[13]);
> +      QROUND (x[3], x[4], x[9],  x[14]);
> +    }
> +
> +  /* Unroll the loop a bit.  */
> +  for (int i = 0; i < CHACHA20_BLOCK_WORDS / 4; i++)
> +    {
> +      stream[i*4+0] = set_state (x[i*4+0] + state[i*4+0]);
> +      stream[i*4+1] = set_state (x[i*4+1] + state[i*4+1]);
> +      stream[i*4+2] = set_state (x[i*4+2] + state[i*4+2]);
> +      stream[i*4+3] = set_state (x[i*4+3] + state[i*4+3]);
> +    }
> +
> +  state[12]++;
> +}
> +
> +static void
> +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> +               size_t bytes)
> +{
> +  uint32_t stream[CHACHA20_BLOCK_WORDS];
> +
> +  while (bytes >= CHACHA20_BLOCK_SIZE)
> +    {
> +      chacha20_block (state, stream);
> +#ifdef CHACHA20_XOR_FINAL
> +      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
> +       stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
> +#endif
> +      memcpy (dst, stream, CHACHA20_BLOCK_SIZE);
> +      bytes -= CHACHA20_BLOCK_SIZE;
> +      dst += CHACHA20_BLOCK_SIZE;
> +      src += CHACHA20_BLOCK_SIZE;
> +    }
> +  if (bytes != 0)
> +    {
> +      chacha20_block (state, stream);
> +#ifdef CHACHA20_XOR_FINAL
> +      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
> +       stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
> +#endif
> +      memcpy (dst, stream, bytes);
> +    }
> +}
> diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
> index bf7cd438e1..f2b0c83c12 100644
> --- a/stdlib/stdlib.h
> +++ b/stdlib/stdlib.h
> @@ -485,6 +485,7 @@ extern unsigned short int *seed48 (unsigned short int __seed16v[3])
>  extern void lcong48 (unsigned short int __param[7]) __THROW __nonnull ((1));
>
>  # ifdef __USE_MISC
> +#  include <bits/stdint-uintn.h>
>  /* Data structure for communication with thread safe versions.  This
>     type is to be regarded as opaque.  It's only exported because users
>     have to allocate objects of this type.  */
> @@ -533,6 +534,19 @@ extern int seed48_r (unsigned short int __seed16v[3],
>  extern int lcong48_r (unsigned short int __param[7],
>                       struct drand48_data *__buffer)
>       __THROW __nonnull ((1, 2));
> +
> +/* Return a random integer between zero and 2**31-1 (inclusive).  */
> +extern uint32_t arc4random (void)
> +     __THROW __wur;
> +
> +/* Fill the buffer with random data.  */
> +extern void arc4random_buf (void *__buf, size_t __size)
> +     __THROW __nonnull ((1));
> +
> +/* Return a random number between zero (inclusive) and the specified
> +   limit (exclusive).  */
> +extern uint32_t arc4random_uniform (uint32_t __upper_bound)
> +     __THROW __wur;
>  # endif        /* Use misc.  */
>  #endif /* Use misc or X/Open.  */
>
> diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h
> index 2104efeb54..f4882a9ffd 100644
> --- a/sysdeps/generic/not-cancel.h
> +++ b/sysdeps/generic/not-cancel.h
> @@ -48,5 +48,7 @@
>    (void) __writev (fd, iov, n)
>  #define __fcntl64_nocancel(fd, cmd, ...) \
>    __fcntl64 (fd, cmd, __VA_ARGS__)
> +#define __getrandomn_nocancel(buf, size, flags) \
> +  __getrandom (buf, size, flags)
>
>  #endif /* NOT_CANCEL_H  */
> diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
> index 4dc87e9061..7bd565103b 100644
> --- a/sysdeps/mach/hurd/i386/libc.abilist
> +++ b/sysdeps/mach/hurd/i386/libc.abilist
> @@ -2289,6 +2289,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 close_range F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h
> index 6ec92ced84..39edfe76b6 100644
> --- a/sysdeps/mach/hurd/not-cancel.h
> +++ b/sysdeps/mach/hurd/not-cancel.h
> @@ -74,6 +74,9 @@ __typeof (__fcntl) __fcntl_nocancel;
>  #define __fcntl64_nocancel(...) \
>    __fcntl_nocancel (__VA_ARGS__)
>
> +#define __getrandomn_nocancel(buf, size, flags) \
> +  __getrandom (buf, size, flags)
> +
>  #if IS_IN (libc)
>  hidden_proto (__close_nocancel)
>  hidden_proto (__close_nocancel_nostatus)
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> index 1b63d9e447..f8f38bb205 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> @@ -2616,3 +2616,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> index e7e4cf7d2a..9de1726de0 100644
> --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> @@ -2713,6 +2713,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
> index bc3d228e31..16e2532838 100644
> --- a/sysdeps/unix/sysv/linux/arc/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
> @@ -2377,3 +2377,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> index db7039c4ab..ae9e465088 100644
> --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> @@ -496,6 +496,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>  GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> index d2add4fb49..b669f43194 100644
> --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> @@ -493,6 +493,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>  GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
> index 355d72a30c..42daa90248 100644
> --- a/sysdeps/unix/sysv/linux/csky/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
> @@ -2652,3 +2652,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> index 3df39bb28c..090be20f53 100644
> --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> @@ -2601,6 +2601,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
> index c4da358f80..6b7cf064bb 100644
> --- a/sysdeps/unix/sysv/linux/i386/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
> @@ -2785,6 +2785,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> index 241bac70ea..3e766f64dd 100644
> --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> @@ -2551,6 +2551,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> index 78bf372b72..c0b99199a8 100644
> --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> @@ -497,6 +497,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0x98
>  GLIBC_2.4 _IO_2_1_stdin_ D 0x98
> diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> index 00df5c901f..4d0be7c86d 100644
> --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> @@ -2728,6 +2728,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> index e8118569c3..b944680ede 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> @@ -2701,3 +2701,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> index c0d2373e64..28f7d19983 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> @@ -2698,3 +2698,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> index 2d0fd04f54..3da7cdaca5 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> @@ -2693,6 +2693,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> index e39ccfb312..9fe87f15be 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> @@ -2691,6 +2691,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> index 1e900f86e4..c14fca2111 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> @@ -2699,6 +2699,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> index 9145ba7931..a363830226 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> @@ -2602,6 +2602,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> index e95d60d926..89b6f98667 100644
> --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> @@ -2740,3 +2740,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
> index 75b9e0ee1e..be5df35927 100644
> --- a/sysdeps/unix/sysv/linux/not-cancel.h
> +++ b/sysdeps/unix/sysv/linux/not-cancel.h
> @@ -67,6 +67,13 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt)
>    INTERNAL_SYSCALL_CALL (writev, fd, iov, iovcnt);
>  }
>
> +static inline int
> +__getrandomn_nocancel (void *buf, size_t buflen, unsigned int flags)
> +{
> +  return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags);
> +}
> +
> +
>  /* Uncancelable fcntl.  */
>  __typeof (__fcntl) __fcntl64_nocancel;
>
> diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> index ca934e374b..94c0ff9526 100644
> --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> @@ -2123,3 +2123,6 @@ GLIBC_2.35 wprintf F
>  GLIBC_2.35 write F
>  GLIBC_2.35 writev F
>  GLIBC_2.35 wscanf F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> index 3820b9f235..d6188de00b 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> @@ -2755,6 +2755,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> index 464dc27fcd..8201230059 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> @@ -2788,6 +2788,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> index 2f7e58747f..623505d783 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> @@ -2510,6 +2510,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> index 4f3043d913..23b0d83408 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> @@ -2812,3 +2812,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> index 84b6ac815a..a72e8ed9cc 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> @@ -2379,3 +2379,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> index 4d5c19c56a..f3faecc2ae 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> @@ -2579,3 +2579,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> index 7c5ee8d569..105e5a9231 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> @@ -2753,6 +2753,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> index 50de0b46cf..c08c6c8301 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> @@ -2547,6 +2547,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> index 66fba013ca..8ec1005644 100644
> --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> @@ -2608,6 +2608,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> index 38703f8aa0..5d776576f9 100644
> --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> @@ -2605,6 +2605,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> index 6df55eb765..f5f07f612e 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> @@ -2748,6 +2748,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> index b90569d881..be687ebe02 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> @@ -2574,6 +2574,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> index e88b0f101f..7f456fbb55 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> @@ -2525,6 +2525,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> index e0755272eb..c737201248 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> @@ -2631,3 +2631,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> --
> 2.32.0
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-19 21:52   ` H.J. Lu
@ 2022-04-20 12:38     ` Adhemerval Zanella
  0 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-20 12:38 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GNU C Library, Florian Weimer



On 19/04/2022 18:52, H.J. Lu wrote:
> On Tue, Apr 19, 2022 at 2:29 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>>
>> diff --git a/NEWS b/NEWS
>> index 4b6d9de2b5..4d9d95b35b 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -9,7 +9,9 @@ Version 2.36
>>
>>  Major new features:
>>
>> -  [Add new features here]
>> +* The functions arc4random, arc4random_buf, arc4random_uniform have been
>> +  added.  The functions use a cryptographic pseudo-random number generator
>> +  based on ChaCha20 initilized with entropy from kernel.
>                                          ^^^^^^^^ Typo.
>>
>>  Deprecated and removed features, and other changes affecting compatibility:

Ack.


>> +
>> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
>> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
>> +   number of bytes until a reseed.
>> +
>> +   Both the initial seed an reseed tries to obtain entropy from the kernel
>                                          ^^^^^^^^^^^^^^^^ Typo?
>> +   and abort the process if none could be obtained.
>> +
>> +   The state 'buf' improves the usage of the cipher call, allowing to call
>> +   optimized implementations (if the archictecture provides it) and optimize
>                                                               ^^^^^^^^^^^^ Typo?

Ack.

>> +
>> +  /* The general case.  This algorithm follows Jérémie Lumbroso,
>> +     Optimal Discrete Uniform Generation from Coin Flips, and
>> +     Applications (2013), who credits Donald E. Knuth and Andrew
>> +     C. Yao, The complexity of nonuniform random number generation
>> +     (1976), for solving the general case.
>> +
>> +     The implementation below unrolls the initialization stage of the
>> +     loop, where v is less than n.  */
>> +
>> +  /* Use 64-bit variables even though the intermediate results are
>> +     never larger that 33 bits.  This ensures the code easier to
>                                than

Ack.

>> +     compile on 64-bit architectures.  */
>> +  uint64_t v;
>> +  uint64_t c;
>> +
>> +  /* Initialize v and c.  v is the smallest power of 2 which is larger
>> +     than n.*/
>> +  {
>> +    uint32_t log2p1 = 32 - __builtin_clz (n);
>> +    v = 1ULL << log2p1;
>> +    c = bits & (v - 1);
>> +    bits >>= log2p1;
>> +    bits_length -= log2p1;
>> +  }
>> +
>> +  /* At the start of the loop, c is uniformly distributed within the
>> +     half-open interval [0, v), and v < 2n < 2**33.  */
>> +  while (true)
>> +    {
>> +      if (v >= n)
>> +        {
>> +          /* If the candidate is less than n, accept it.  */
>> +          if (c < n)
>> +            /* c is uniformly distributed on [0, n).  */
>> +            return c;
>> +          else
>> +            {
>> +              /* c is uniformly distributed on [n, v).  */
>> +              v -= n;
>> +              c -= n;
>> +              /* The distribution was shifted, so c is uniformly
>> +                 distributed on [0, v) again.  */
>> +            }
>> +        }
>> +      /* v < n here.  */
>> +
>> +      /* Replenish the bit source if necessary.  */
>> +      if (bits_length == 0)
>> +        {
>> +          /* Overwrite the least significant byte.  */
>> +         random_bytes (&bits, 1);
>> +         bits_length = CHAR_BIT;
>> +        }
>> +
>> +      /* Double the range.  No overflow because v < n < 2**32.  */
>> +      v *= 2;
>> +      /* v < 2n here.  */
>> +
>> +      /* Extract a bit and append it to c.  c remains less than v and
>> +         thus 2**33.  */
>> +      c = (c << 1) | (bits & 1);
>> +      bits >>= 1;
>> +      --bits_length;
>> +
>> +      /* At this point, c is uniformly distributed on [0, v) again,
>> +         and v < 2n < 2**33.  */
>> +    }
>> +}
>> +
>> +__libc_lock_define (extern , __arc4random_lock attribute_hidden)
>> +
>> +uint32_t
>> +__arc4random_uniform (uint32_t upper_bound)
>> +{
>> +  uint32_t r;
>> +  __libc_lock_lock (__arc4random_lock);
>> +  r = compute_uniform (upper_bound);
>> +  __libc_lock_unlock (__arc4random_lock);
>> +  return r;
>> +}
>> +libc_hidden_def (__arc4random_uniform)
>> +weak_alias (__arc4random_uniform, arc4random_uniform)
>> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
>> new file mode 100644
>> index 0000000000..af4ffa9860
>> --- /dev/null
>> +++ b/stdlib/chacha20.c
>> @@ -0,0 +1,163 @@
>> +/* Generic ChaCha20 implementation (used on arc4random).
>> +   Copyright (C) 2022 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <array_length.h>
>> +#include <endian.h>
>> +#include <stddef.h>
>> +#include <stdint.h>
>> +#include <string.h>
>> +
>> +/* 32-bit stream position, then 96-bit nonce.  */
>> +#define CHACHA20_IV_SIZE       16
>> +#define CHACHA20_KEY_SIZE      32
>> +
>> +#define CHACHA20_BLOCK_SIZE     64
>> +#define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
>> +
>> +#define CHACHA20_STATE_LEN     16
>> +
>> +/* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
>> +   Sby RFC8439.  Since the input stream will either zero bytes (initial state)
>          by

Ack.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 7/9] powerpc64: Add optimized chacha20
  2022-04-19 21:28 ` [PATCH v3 7/9] powerpc64: Add " Adhemerval Zanella
@ 2022-04-20 18:38   ` Paul E Murphy
  2022-04-20 19:23     ` Adhemerval Zanella
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E Murphy @ 2022-04-20 18:38 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha



On 4/19/22 4:28 PM, Adhemerval Zanella via Libc-alpha wrote:
> It adds vectorized ChaCha20 implementation based on libgcrypt
> cipher/chacha20-ppc.c.  It targets POWER8 and it is used on
> default for LE.

> diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
> new file mode 100644
> index 0000000000..e2567c379a
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c

How difficult is it to keep this synchronized with the upstream version 
in libgcrypt?  Also, this seems like it would be a better placed in the 
power8 subdirectory.

> diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h
> new file mode 100644
> index 0000000000..a18115392f
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h
> @@ -0,0 +1,47 @@
> +/* PowerPC optimization for ChaCha20.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <stdbool.h>
> +#include <ldsodefs.h>
> +
> +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
> +					const uint8_t *src, size_t nblks)
> +     attribute_hidden;
> +
> +static void
> +chacha20_crypt (uint32_t *state, uint8_t *dst,
> +		const uint8_t *src, size_t bytes)
> +{
> +  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
> +		  "CHACHA20_BUFSIZE not multiple of 4");
> +  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
> +		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
> +
> +#ifdef __LITTLE_ENDIAN__
> +  __chacha20_power8_blocks4 (state, dst, src,
> +			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> +#else
> +  unsigned long int hwcap = GLRO(dl_hwcap);
> +  unsigned long int hwcap2 = GLRO(dl_hwcap2);
> +  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
> +    __chacha20_power8_blocks4 (state, dst, src,
> +			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> +  else
> +    chacha20_crypt_generic (state, dst, src, bytes);
> +#endif

This file doesn't seem to obey the multiarch conventions of other 
powerpc64 specific bits. Is it possible to implement multiarch support 
similar to the libc/libm routines?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 7/9] powerpc64: Add optimized chacha20
  2022-04-20 18:38   ` Paul E Murphy
@ 2022-04-20 19:23     ` Adhemerval Zanella
  2022-04-22 21:09       ` Paul E Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-20 19:23 UTC (permalink / raw)
  To: Paul E Murphy, libc-alpha



On 20/04/2022 15:38, Paul E Murphy wrote:
> 
> 
> On 4/19/22 4:28 PM, Adhemerval Zanella via Libc-alpha wrote:
>> It adds vectorized ChaCha20 implementation based on libgcrypt
>> cipher/chacha20-ppc.c.  It targets POWER8 and it is used on
>> default for LE.
> 
>> diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
>> new file mode 100644
>> index 0000000000..e2567c379a
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
> 
> How difficult is it to keep this synchronized with the upstream version in libgcrypt?  Also, this seems like it would be a better placed in the power8 subdirectory.

It would be somewhat complicate because libgcrypt also implements the
poly1305 on the same file (which uses common macros and definition
for chacha20) and it adds final XOR based on input stream (which
for arc4random usage is not required since it does not add any
hardening).

It would require to refactor libgcrypt code a bit to split the
chacha and poly1305 and to add a macro to XOR the input.

> 
>> diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h
>> new file mode 100644
>> index 0000000000..a18115392f
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h
>> @@ -0,0 +1,47 @@
>> +/* PowerPC optimization for ChaCha20.
>> +   Copyright (C) 2022 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public
>> +   License as published by the Free Software Foundation; either
>> +   version 2.1 of the License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; if not, see
>> +   <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <stdbool.h>
>> +#include <ldsodefs.h>
>> +
>> +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
>> +                    const uint8_t *src, size_t nblks)
>> +     attribute_hidden;
>> +
>> +static void
>> +chacha20_crypt (uint32_t *state, uint8_t *dst,
>> +        const uint8_t *src, size_t bytes)
>> +{
>> +  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
>> +          "CHACHA20_BUFSIZE not multiple of 4");
>> +  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
>> +          "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
>> +
>> +#ifdef __LITTLE_ENDIAN__
>> +  __chacha20_power8_blocks4 (state, dst, src,
>> +                 CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
>> +#else
>> +  unsigned long int hwcap = GLRO(dl_hwcap);
>> +  unsigned long int hwcap2 = GLRO(dl_hwcap2);
>> +  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
>> +    __chacha20_power8_blocks4 (state, dst, src,
>> +                   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
>> +  else
>> +    chacha20_crypt_generic (state, dst, src, bytes);
>> +#endif
> 
> This file doesn't seem to obey the multiarch conventions of other powerpc64 specific bits. Is it possible to implement multiarch support similar to the libc/libm routines?

I am not very found of the powerpc multiarch convention and it would
require some more boilerplate code to handle BE, but it is doable.

So LE will continue to use __chacha20_power8_blocks4 as 
default, while BE will just select if --with-arch=power8 is defined
for for default build.  With --disable-multi-arch the power8 will be
select iff --with-arch=power8 is set.

---

diff --git a/sysdeps/powerpc/powerpc64/Makefile b/sysdeps/powerpc/powerpc64/Makefile
index 18943ef09e..679d5e49ba 100644
--- a/sysdeps/powerpc/powerpc64/Makefile
+++ b/sysdeps/powerpc/powerpc64/Makefile
@@ -66,9 +66,6 @@ tst-setjmp-bug21895-static-ENV = \
 endif
 
 ifeq ($(subdir),stdlib)
-sysdep_routines += chacha20-ppc
-CFLAGS-chacha20-ppc.c += -mcpu=power8
-
 CFLAGS-tst-ucontext-ppc64-vscr.c += -maltivec
 tests += tst-ucontext-ppc64-vscr
 endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
new file mode 100644
index 0000000000..8c75165f7f
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
@@ -0,0 +1,4 @@
+ifeq ($(subdir),stdlib)
+sysdep_routines += chacha20-ppc
+CFLAGS-chacha20-ppc.c += -mcpu=power8
+endif
diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
new file mode 100644
index 0000000000..cf9e735326
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
@@ -0,0 +1 @@
+#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
similarity index 92%
rename from sysdeps/powerpc/powerpc64/chacha20_arch.h
rename to sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
index a18115392f..6d2762d82b 100644
--- a/sysdeps/powerpc/powerpc64/chacha20_arch.h
+++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
@@ -32,10 +32,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst,
   _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
 		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
 
-#ifdef __LITTLE_ENDIAN__
-  __chacha20_power8_blocks4 (state, dst, src,
-			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
-#else
   unsigned long int hwcap = GLRO(dl_hwcap);
   unsigned long int hwcap2 = GLRO(dl_hwcap2);
   if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
@@ -43,5 +39,4 @@ chacha20_crypt (uint32_t *state, uint8_t *dst,
 			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
   else
     chacha20_crypt_generic (state, dst, src, bytes);
-#endif
 }
diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
index 71a59529f3..abb0aa3f11 100644
--- a/sysdeps/powerpc/powerpc64/power8/Makefile
+++ b/sysdeps/powerpc/powerpc64/power8/Makefile
@@ -1,3 +1,8 @@
 ifeq ($(subdir),string)
 sysdep_routines += strcasestr-ppc64
 endif
+
+ifeq ($(subdir),stdlib)
+sysdep_routines += chacha20-ppc
+CFLAGS-chacha20-ppc.c += -mcpu=power8
+endif
diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
similarity index 100%
rename from sysdeps/powerpc/powerpc64/chacha20-ppc.c
rename to sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
new file mode 100644
index 0000000000..270c71130f
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
@@ -0,0 +1,37 @@
+/* PowerPC optimization for ChaCha20.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdbool.h>
+#include <ldsodefs.h>
+
+unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
+					const uint8_t *src, size_t nblks)
+     attribute_hidden;
+
+static void
+chacha20_crypt (uint32_t *state, uint8_t *dst,
+		const uint8_t *src, size_t bytes)
+{
+  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
+		  "CHACHA20_BUFSIZE not multiple of 4");
+  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
+		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
+
+  __chacha20_power8_blocks4 (state, dst, src,
+			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
+}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
  2022-04-19 21:52   ` H.J. Lu
@ 2022-04-22 13:54   ` Yann Droneaud
  2022-04-25 12:15     ` Adhemerval Zanella
  2022-04-25  2:22   ` Mark Harris
  2 siblings, 1 reply; 22+ messages in thread
From: Yann Droneaud @ 2022-04-22 13:54 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha; +Cc: Florian Weimer

Le 19/04/2022 à 23:28, Adhemerval Zanella via Libc-alpha a écrit :
> The implementation is based on scalar Chacha20, with global cache and
> locking.  It uses getrandom or /dev/urandom as fallback to get the
> initial entropy, and reseeds the internal state on every 16MB of
> consumed buffer.
>
> It maintains an internal buffer which consumes at maximum one page on
> most systems (assuming minimum of 4k pages).  The internal buf optimizes
> the cipher encrypt calls, by amortize arc4random calls (where both
> function call and locks cost are the dominating factor).
>
> The ChaCha20 implementation is based on the RFC8439 [1], with last
> step that XOR with the input omited.  Since the input stream will either
> zero bytes (initial state) or the PRNG output itself this step does not
> add any extra entropy.


This can also state the implementation is following OpenBSD arc4random 
current implementation.


> The arc4random_uniform is based on previous work by Florian Weimer.
>
> Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
>
> Co-authored-by: Florian Weimer <fweimer@redhat.com>
>
> [1] https://datatracker.ietf.org/doc/html/rfc8439
> ---
>   NEWS                                          |   4 +-
>   include/stdlib.h                              |  13 +
>   posix/fork.c                                  |   2 +
>   stdlib/Makefile                               |   2 +
>   stdlib/Versions                               |   5 +
>   stdlib/arc4random.c                           | 245 ++++++++++++++++++
>   stdlib/arc4random_uniform.c                   | 152 +++++++++++
>   stdlib/chacha20.c                             | 163 ++++++++++++
>   stdlib/stdlib.h                               |  14 +
>   sysdeps/generic/not-cancel.h                  |   2 +
>   sysdeps/mach/hurd/i386/libc.abilist           |   3 +
>   sysdeps/mach/hurd/not-cancel.h                |   3 +
>   sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   3 +
>   sysdeps/unix/sysv/linux/alpha/libc.abilist    |   3 +
>   sysdeps/unix/sysv/linux/arc/libc.abilist      |   3 +
>   sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   3 +
>   sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   3 +
>   sysdeps/unix/sysv/linux/csky/libc.abilist     |   3 +
>   sysdeps/unix/sysv/linux/hppa/libc.abilist     |   3 +
>   sysdeps/unix/sysv/linux/i386/libc.abilist     |   3 +
>   sysdeps/unix/sysv/linux/ia64/libc.abilist     |   3 +
>   .../sysv/linux/m68k/coldfire/libc.abilist     |   3 +
>   .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   3 +
>   .../sysv/linux/microblaze/be/libc.abilist     |   3 +
>   .../sysv/linux/microblaze/le/libc.abilist     |   3 +
>   .../sysv/linux/mips/mips32/fpu/libc.abilist   |   3 +
>   .../sysv/linux/mips/mips32/nofpu/libc.abilist |   3 +
>   .../sysv/linux/mips/mips64/n32/libc.abilist   |   3 +
>   .../sysv/linux/mips/mips64/n64/libc.abilist   |   3 +
>   sysdeps/unix/sysv/linux/nios2/libc.abilist    |   3 +
>   sysdeps/unix/sysv/linux/not-cancel.h          |   7 +
>   sysdeps/unix/sysv/linux/or1k/libc.abilist     |   3 +
>   .../linux/powerpc/powerpc32/fpu/libc.abilist  |   3 +
>   .../powerpc/powerpc32/nofpu/libc.abilist      |   3 +
>   .../linux/powerpc/powerpc64/be/libc.abilist   |   3 +
>   .../linux/powerpc/powerpc64/le/libc.abilist   |   3 +
>   .../unix/sysv/linux/riscv/rv32/libc.abilist   |   3 +
>   .../unix/sysv/linux/riscv/rv64/libc.abilist   |   3 +
>   .../unix/sysv/linux/s390/s390-32/libc.abilist |   3 +
>   .../unix/sysv/linux/s390/s390-64/libc.abilist |   3 +
>   sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   3 +
>   sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   3 +
>   .../sysv/linux/sparc/sparc32/libc.abilist     |   3 +
>   .../sysv/linux/sparc/sparc64/libc.abilist     |   3 +
>   .../unix/sysv/linux/x86_64/64/libc.abilist    |   3 +
>   .../unix/sysv/linux/x86_64/x32/libc.abilist   |   3 +
>   46 files changed, 713 insertions(+), 1 deletion(-)
>   create mode 100644 stdlib/arc4random.c
>   create mode 100644 stdlib/arc4random_uniform.c
>   create mode 100644 stdlib/chacha20.c
>
> diff --git a/NEWS b/NEWS
> index 4b6d9de2b5..4d9d95b35b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,9 @@ Version 2.36
>   
>   Major new features:
>   
> -  [Add new features here]
> +* The functions arc4random, arc4random_buf, arc4random_uniform have been
> +  added.  The functions use a cryptographic pseudo-random number generator
> +  based on ChaCha20 initilized with entropy from kernel.
>   
>   Deprecated and removed features, and other changes affecting compatibility:
>   
> diff --git a/include/stdlib.h b/include/stdlib.h
> index 1c6f70b082..055f9d2965 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -144,6 +144,19 @@ libc_hidden_proto (__ptsname_r)
>   libc_hidden_proto (grantpt)
>   libc_hidden_proto (unlockpt)
>   
> +__typeof (arc4random) __arc4random;
> +libc_hidden_proto (__arc4random);
> +__typeof (arc4random_buf) __arc4random_buf;
> +libc_hidden_proto (__arc4random_buf);
> +__typeof (arc4random_uniform) __arc4random_uniform;
> +libc_hidden_proto (__arc4random_uniform);
> +extern void __arc4random_buf_internal (void *buffer, size_t len)
> +     attribute_hidden;
> +/* Called from the fork function to reinitialize the internal lock in thte
> +   child process.  This avoids deadlocks if fork is called in multi-threaded
> +   processes.  */
> +extern void __arc4random_fork_subprocess (void) attribute_hidden;
> +
>   extern double __strtod_internal (const char *__restrict __nptr,
>   				 char **__restrict __endptr, int __group)
>        __THROW __nonnull ((1)) __wur;
> diff --git a/posix/fork.c b/posix/fork.c
> index 6b50c091f9..87d8329b46 100644
> --- a/posix/fork.c
> +++ b/posix/fork.c
> @@ -96,6 +96,8 @@ __libc_fork (void)
>   				     &nss_database_data);
>   	}
>   
> +      call_function_static_weak (__arc4random_fork_subprocess);
> +
>         /* Reset the lock the dynamic loader uses to protect its data.  */
>         __rtld_lock_initialize (GL(dl_load_lock));
>   
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index 60fc59c12c..9f9cc1bd7f 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -53,6 +53,8 @@ routines := \
>     a64l \
>     abort \
>     abs \
> +  arc4random \
> +  arc4random_uniform \
>     at_quick_exit \
>     atof \
>     atoi \
> diff --git a/stdlib/Versions b/stdlib/Versions
> index 5e9099a153..d09a308fb5 100644
> --- a/stdlib/Versions
> +++ b/stdlib/Versions
> @@ -136,6 +136,11 @@ libc {
>       strtof32; strtof64; strtof32x;
>       strtof32_l; strtof64_l; strtof32x_l;
>     }
> +  GLIBC_2.36 {
> +    arc4random;
> +    arc4random_buf;
> +    arc4random_uniform;
> +  }
>     GLIBC_PRIVATE {
>       # functions which have an additional interface since they are
>       # are cancelable.
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> new file mode 100644
> index 0000000000..cddb0e405a
> --- /dev/null
> +++ b/stdlib/arc4random.c
> @@ -0,0 +1,245 @@
> +/* Pseudo Random Number Generator based on ChaCha20.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <libc-lock.h>
> +#include <not-cancel.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/mman.h>
> +#include <sys/param.h>
> +#include <sys/random.h>
> +
> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
> +   number of bytes until a reseed.
> +
> +   Both the initial seed an reseed tries to obtain entropy from the kernel

an ->  and


> +   and abort the process if none could be obtained.
> +
> +   The state 'buf' improves the usage of the cipher call, allowing to call
> +   optimized implementations (if the archictecture provides it) and optimize
> +   arc4random calls (since only multiple call it will encrypt the next block).
> + */
> +
> +/* Maximum number bytes until reseed (16 MB).  */
> +#define CHACHE_RESEED_SIZE	(16 * 1024 * 1024)
> +/* Internal buffer size in bytes (1KB).  */
> +#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> +
> +#include <chacha20.c>
> +
> +static struct arc4random_state
> +{
> +  uint32_t ctx[CHACHA20_STATE_LEN];
> +  size_t have;
> +  size_t count;
> +  uint8_t buf[CHACHA20_BUFSIZE];
> +} *state;
> +
> +/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus
> +   it does not require to clear the internal state.  */
> +static bool __arc4random_wipeonfork = false;
> +
> +__libc_lock_define_initialized (, __arc4random_lock);
> +
> +/* Called from the fork function to reset the state if MADV_WIPEONFORK is
> +   not supported and to reinit the internal lock.  */
> +void
> +__arc4random_fork_subprocess (void)
> +{
> +  if (__arc4random_wipeonfork && state != NULL)
> +    memset (state, 0, sizeof (struct arc4random_state));
> +
> +  __libc_lock_init (__arc4random_lock);
> +}
> +
> +static void
> +arc4random_allocate_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n");
> +}
> +
> +static void
> +arc4random_getrandom_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n");
> +}
> +
> +/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
> +   the fork callback will reset the state on the fork call.  It does not
> +   handle direct clone calls, nor vfork or _Fork (arc4random is not
> +   async-signal-safe due the internal lock usage).  */
> +static void
> +arc4random_init (uint8_t *buf, size_t len)
> +{
> +  state = __mmap (NULL, sizeof (struct arc4random_state),
> +		  PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +  if (state == MAP_FAILED)
> +    arc4random_allocate_failure ();
> +
> +#ifdef MADV_WIPEONFORK
> +  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
> +  if (r == 0)
> +    __arc4random_wipeonfork = true;
> +  else if (errno != EINVAL)
> +    arc4random_allocate_failure ();
> +#endif
> +
> +  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
> +}
> +
> +#define min(x,y) (((x) > (y)) ? (y) : (x))
> +
> +static void
> +arc4random_rekey (uint8_t *rnd, size_t rndlen)
> +{
> +  memset (state->buf, 0, sizeof state->buf);

There's no need to clear buf as call to chacha20_crypt() will overwrite 
it (since it doesn't XOR with it anymore).

See 
https://github.com/openbsd/src/blob/master/lib/libc/crypt/arc4random.c#L121


> +  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> +
> +  /* Mix some extra entropy if provided.  */
> +  if (rnd != NULL)
> +    {
> +      size_t m = min (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +      for (size_t i = 0; i < m; i++)
> +	state->buf[i] ^= rnd[i];
> +    }
> +
> +  /* Immediately reinit for backtracking resistance.  */
> +  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> +  memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +}
> +
> +static void
> +arc4random_getentropy (uint8_t *rnd, size_t len)
> +{
> +  if (__getrandomn_nocancel (rnd, len, GRND_NONBLOCK) == len)
> +    return;
> +
> +  int fd = __open64_nocancel ("/dev/urandom", O_RDONLY);
> +  if (fd != -1)
> +    {
> +      unsigned char *p = rnd;
> +      unsigned char *end = p + len;
> +      do
> +	{
> +	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> +	  if (ret <= 0)
> +	    arc4random_getrandom_failure ();
> +	  p += ret;
> +	}
> +      while (p < end);
> +
> +      if (__close_nocancel (fd) != 0)
> +	return;
> +    }
> +  arc4random_getrandom_failure ();
> +}
> +
> +/* Either allocates the state buffer or reinit it by reseeding the cipher
> +   state with kernel entropy.  */
> +static void
> +arc4random_stir (void)
> +{
> +  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> +  arc4random_getentropy (rnd, sizeof rnd);
> +
> +  if (state == NULL)
> +    arc4random_init (rnd, sizeof rnd);
> +  else
> +    arc4random_rekey (rnd, sizeof rnd);
> +
> +  explicit_bzero (rnd, sizeof rnd);
> +
> +  state->have = 0;
> +  memset (state->buf, 0, sizeof state->buf);
> +  state->count = CHACHE_RESEED_SIZE;
> +}
> +
> +static void
> +arc4random_check_stir (size_t len)
> +{
> +  if (state == NULL || state->count < len)
> +    arc4random_stir ();
> +  if (state->count <= len)
> +    state->count = 0;
> +  else
> +    state->count -= len;
> +}
> +
> +void
> +__arc4random_buf_internal (void *buffer, size_t len)
> +{
> +  arc4random_check_stir (len);
> +
> +  while (len > 0)
> +    {
> +      if (state->have > 0)
> +	{
> +	  size_t m = min (len, state->have);
> +	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +	  memcpy (buffer, ks, m);
> +	  memset (ks, 0, m);
> +	  buffer += m;
> +	  len -= m;
> +	  state->have -= m;
> +	}
> +      if (state->have == 0)
> +	arc4random_rekey (NULL, 0);
> +    }
> +}
> +
> +void
> +__arc4random_buf (void *buffer, size_t len)
> +{
> +  __libc_lock_lock (__arc4random_lock);
> +  __arc4random_buf_internal (buffer, len);
> +  __libc_lock_unlock (__arc4random_lock);
> +}
> +libc_hidden_def (__arc4random_buf)
> +weak_alias (__arc4random_buf, arc4random_buf)
> +
> +
> +static uint32_t
> +__arc4random_internal (void)
> +{
> +  uint32_t r;
> +
> +  arc4random_check_stir (sizeof (uint32_t));
> +  if (state->have < sizeof (uint32_t))
> +    arc4random_rekey (NULL, 0);
> +  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +  memcpy (&r, ks, sizeof (uint32_t));
> +  memset (ks, 0, sizeof (uint32_t));
> +  state->have -= sizeof (uint32_t);
> +
> +  return r;
> +}
> +
> +uint32_t
> +__arc4random (void)
> +{
> +  uint32_t r;
> +  __libc_lock_lock (__arc4random_lock);
> +  r = __arc4random_internal ();
> +  __libc_lock_unlock (__arc4random_lock);
> +  return r;
> +}
> +libc_hidden_def (__arc4random)
> +weak_alias (__arc4random, arc4random)
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> new file mode 100644
> index 0000000000..96ffe62df1
> --- /dev/null
> +++ b/stdlib/arc4random_uniform.c
> @@ -0,0 +1,152 @@
> +/* Random pseudo generator numbers between 0 and 2**-31 (inclusive)
> +   uniformly distributed but with an upper_bound.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <endian.h>
> +#include <libc-lock.h>
> +#include <stdlib.h>
> +#include <sys/param.h>
> +
> +/* Return the number of bytes which cover values up to the limit.  */
> +__attribute__ ((const))
> +static uint32_t
> +byte_count (uint32_t n)
> +{
> +  if (n <= (1U << 8))
> +    return 1;
> +  else if (n <= (1U << 16))
> +    return 2;
> +  else if (n <= (1U << 24))
> +    return 3;
> +  else
> +    return 4;
> +}
> +
> +/* Fill the lower bits of the result with randomness, according to the
> +   number of bytes requested.  */
> +static void
> +random_bytes (uint32_t *result, uint32_t byte_count)
> +{
> +  *result = 0;
> +  unsigned char *ptr = (unsigned char *) result;
> +  if (__BYTE_ORDER == __BIG_ENDIAN)
> +    ptr += 4 - byte_count;
> +  __arc4random_buf_internal (ptr, byte_count);
> +}
> +
> +static uint32_t
> +compute_uniform (uint32_t n)
> +{
> +  if (n <= 1)
> +    /* There is no valid return value for a zero limit, and 0 is the
> +       only possible result for limit 1.  */
> +    return 0;
> +
> +  /* The bits variable serves as a source for bits.  Prefetch the
> +     minimum number of bytes needed.  */
> +  unsigned count = byte_count (n);
> +  uint32_t bits_length = count * CHAR_BIT;
> +  uint32_t bits;
> +  random_bytes (&bits, count);
> +
> +  /* Powers of two are easy.  */
> +  if (powerof2 (n))
> +    return bits & (n - 1);
> +
> +  /* The general case.  This algorithm follows Jérémie Lumbroso,
> +     Optimal Discrete Uniform Generation from Coin Flips, and
> +     Applications (2013), who credits Donald E. Knuth and Andrew
> +     C. Yao, The complexity of nonuniform random number generation
> +     (1976), for solving the general case.
> +
> +     The implementation below unrolls the initialization stage of the
> +     loop, where v is less than n.  */
> +
> +  /* Use 64-bit variables even though the intermediate results are
> +     never larger that 33 bits.  This ensures the code easier to
> +     compile on 64-bit architectures.  */
> +  uint64_t v;
> +  uint64_t c;
> +
> +  /* Initialize v and c.  v is the smallest power of 2 which is larger
> +     than n.*/
> +  {
> +    uint32_t log2p1 = 32 - __builtin_clz (n);
> +    v = 1ULL << log2p1;
> +    c = bits & (v - 1);
> +    bits >>= log2p1;
> +    bits_length -= log2p1;
> +  }
> +
> +  /* At the start of the loop, c is uniformly distributed within the
> +     half-open interval [0, v), and v < 2n < 2**33.  */
> +  while (true)
> +    {
> +      if (v >= n)
> +        {
> +          /* If the candidate is less than n, accept it.  */
> +          if (c < n)
> +            /* c is uniformly distributed on [0, n).  */
> +            return c;
> +          else
> +            {
> +              /* c is uniformly distributed on [n, v).  */
> +              v -= n;
> +              c -= n;
> +              /* The distribution was shifted, so c is uniformly
> +                 distributed on [0, v) again.  */
> +            }
> +        }
> +      /* v < n here.  */
> +
> +      /* Replenish the bit source if necessary.  */
> +      if (bits_length == 0)
> +        {
> +          /* Overwrite the least significant byte.  */
> +	  random_bytes (&bits, 1);
> +	  bits_length = CHAR_BIT;
> +        }
> +
> +      /* Double the range.  No overflow because v < n < 2**32.  */
> +      v *= 2;
> +      /* v < 2n here.  */
> +
> +      /* Extract a bit and append it to c.  c remains less than v and
> +         thus 2**33.  */
> +      c = (c << 1) | (bits & 1);
> +      bits >>= 1;
> +      --bits_length;
> +
> +      /* At this point, c is uniformly distributed on [0, v) again,
> +         and v < 2n < 2**33.  */
> +    }

I'm not familiar with this method.

It's not one reviewed at https://www.pcg-random.org/posts/bounded-rands.html

In this patch I used what's called by PCG author,the "Bitmask with 
Rejection (Unbiased) — Apple's Method"

https://github.com/Parrot-Developers/libfutils/commit/9dc7243ae2f2059b4590a702be2ca9c03578067f

I like it because it doesn't uses modulo at all :)

But the OpenBSD's arc4random_uniform() is even more simple in term of C 
code.


> +}
> +
> +__libc_lock_define (extern , __arc4random_lock attribute_hidden)
> +
> +uint32_t
> +__arc4random_uniform (uint32_t upper_bound)
> +{
> +  uint32_t r;
> +  __libc_lock_lock (__arc4random_lock);
> +  r = compute_uniform (upper_bound);
> +  __libc_lock_unlock (__arc4random_lock);
> +  return r;
> +}
> +libc_hidden_def (__arc4random_uniform)
> +weak_alias (__arc4random_uniform, arc4random_uniform)


-- 

Yann Droneaud

OPTEYA



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random
  2022-04-19 21:28 ` [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random Adhemerval Zanella
@ 2022-04-22 16:02   ` Yann Droneaud
  2022-04-25 12:36     ` Adhemerval Zanella
  0 siblings, 1 reply; 22+ messages in thread
From: Yann Droneaud @ 2022-04-22 16:02 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

Le 19/04/2022 à 23:28, Adhemerval Zanella via Libc-alpha a écrit :
> The arc4random state is moved to TCB, so there is no allocation
> failure.  It adds about 592 bytes struct pthread.

+to struct pthread ?


>
> Now that the state is thread private within a shared struct, the
>   MADV_WIPEONFORK usage is removed.  The cipher state reset is done
>   solely by the atfork internal handler.
>
> The state is also cleared on thread exit iff it was initialized (so if
> arc4random is not called it is not touched).
>
> Although it is lock-free, arc4random is still not async-signal-safe
> (the per thread state is not updated atomically).
>
> On x86_64 using AVX2 it shows a slight better performance:
>
> From
> --------------------------------------------------
> arc4random [single-thread]               809.53
> arc4random_buf(16) [single-thread]       1242.56
> arc4random_buf(32) [single-thread]       1915.90
> arc4random_buf(48) [single-thread]       2230.03
> arc4random_buf(64) [single-thread]       2429.68
> arc4random_buf(80) [single-thread]       2489.70
> arc4random_buf(96) [single-thread]       2598.88
> arc4random_buf(112) [single-thread]      2699.93
> arc4random_buf(128) [single-thread]      2747.31
>
> To                                       MB/s
> --------------------------------------------------
> arc4random [single-thread]               941.54
> arc4random_buf(16) [single-thread]       1409.39
> arc4random_buf(32) [single-thread]       2056.17
> arc4random_buf(48) [single-thread]       2367.13
> arc4random_buf(64) [single-thread]       2551.44
> arc4random_buf(80) [single-thread]       2601.38
> arc4random_buf(96) [single-thread]       2710.21
> arc4random_buf(112) [single-thread]      2797.86
> arc4random_buf(128) [single-thread]      2846.12
> --------------------------------------------------
>
> However it shows a large speed up specially on architecture with
> most costly atomics.  For instance, on a aarch64 Neoverse N1:
>
>  From                                     MB/s
> --------------------------------------------------
> arc4random [single-thread]               154.98
> arc4random_buf(16) [single-thread]       342.63
> arc4random_buf(32) [single-thread]       485.91
> arc4random_buf(48) [single-thread]       539.95
> arc4random_buf(64) [single-thread]       593.38
> arc4random_buf(80) [single-thread]       629.45
> arc4random_buf(96) [single-thread]       655.78
> arc4random_buf(112) [single-thread]      670.54
> arc4random_buf(128) [single-thread]      681.65
> --------------------------------------------------
>
> To                                       MB/s
> --------------------------------------------------
> arc4random [single-thread]               335.94
> arc4random_buf(16) [single-thread]       498.69
> arc4random_buf(32) [single-thread]       612.24
> arc4random_buf(48) [single-thread]       655.77
> arc4random_buf(64) [single-thread]       691.97
> arc4random_buf(80) [single-thread]       701.68
> arc4random_buf(96) [single-thread]       710.35
> arc4random_buf(112) [single-thread]      714.23
> arc4random_buf(128) [single-thread]      722.13
> --------------------------------------------------
>
> Checked on x86_64-linux-gnu.
> ---
>   nptl/allocatestack.c                   |   5 +-
>   stdlib/arc4random.c                    | 137 +++++++------------------
>   stdlib/arc4random.h                    |  45 ++++++++
>   stdlib/arc4random_uniform.c            |   8 +-
>   stdlib/chacha20.c                      |   3 -
>   stdlib/tst-arc4random-chacha20.c       |   2 +-
>   sysdeps/generic/tls-internal-struct.h  |   3 +
>   sysdeps/unix/sysv/linux/tls-internal.h |  27 ++++-
>   8 files changed, 115 insertions(+), 115 deletions(-)
>   create mode 100644 stdlib/arc4random.h
>
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 01a282f3f6..ada65d40c2 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -32,6 +32,7 @@
>   #include <kernel-features.h>
>   #include <nptl-stack.h>
>   #include <libc-lock.h>
> +#include <tls-internal.h>
>   
>   /* Default alignment of stack.  */
>   #ifndef STACK_ALIGN
> @@ -127,7 +128,7 @@ get_cached_stack (size_t *sizep, void **memp)
>   
>     result->exiting = false;
>     __libc_lock_init (result->exit_lock);
> -  result->tls_state = (struct tls_internal_t) { 0 };
> +  __glibc_tls_internal_init (&result->tls_state);
>   
>     /* Clear the DTV.  */
>     dtv_t *dtv = GET_DTV (TLS_TPADJ (result));
> @@ -559,6 +560,8 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>   #endif
>     pd->robust_head.list = &pd->robust_head;
>   
> +  __glibc_tls_internal_init (&pd->tls_state);
> +
>     /* We place the thread descriptor at the end of the stack.  */
>     *pdp = pd;
>   
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> index cddb0e405a..6144275c08 100644
> --- a/stdlib/arc4random.c
> +++ b/stdlib/arc4random.c
> @@ -16,14 +16,15 @@
>      License along with the GNU C Library; if not, see
>      <http://www.gnu.org/licenses/>.  */
>   
> +#include <arc4random.h>
>   #include <errno.h>
> -#include <libc-lock.h>
>   #include <not-cancel.h>
>   #include <stdio.h>
>   #include <stdlib.h>
>   #include <sys/mman.h>
>   #include <sys/param.h>
>   #include <sys/random.h>
> +#include <tls-internal.h>
>   
>   /* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
>      current valid bytes not yet consumed in 'buf', while 'count' is the maximum
> @@ -37,42 +38,16 @@
>      arc4random calls (since only multiple call it will encrypt the next block).
>    */
>   
> -/* Maximum number bytes until reseed (16 MB).  */
> -#define CHACHE_RESEED_SIZE	(16 * 1024 * 1024)
> -/* Internal buffer size in bytes (1KB).  */
> -#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> -
>   #include <chacha20.c>
>   
> -static struct arc4random_state
> -{
> -  uint32_t ctx[CHACHA20_STATE_LEN];
> -  size_t have;
> -  size_t count;
> -  uint8_t buf[CHACHA20_BUFSIZE];
> -} *state;
> -
> -/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus
> -   it does not require to clear the internal state.  */
> -static bool __arc4random_wipeonfork = false;
> -
> -__libc_lock_define_initialized (, __arc4random_lock);
> -
> -/* Called from the fork function to reset the state if MADV_WIPEONFORK is
> -   not supported and to reinit the internal lock.  */
> +/* Called from the fork function to reset the state.  */
>   void
>   __arc4random_fork_subprocess (void)
>   {
> -  if (__arc4random_wipeonfork && state != NULL)
> -    memset (state, 0, sizeof (struct arc4random_state));
> -
> -  __libc_lock_init (__arc4random_lock);
> -}
> -
> -static void
> -arc4random_allocate_failure (void)
> -{
> -  __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n");
> +  struct arc4random_state *state = &__glibc_tls_internal()->rnd_state;
> +  memset (state, 0, sizeof (struct arc4random_state));
> +  /* Force key init.  */
> +  state->count = -1;
>   }
>   
>   static void
> @@ -81,33 +56,10 @@ arc4random_getrandom_failure (void)
>     __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n");
>   }
>   
> -/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
> -   the fork callback will reset the state on the fork call.  It does not
> -   handle direct clone calls, nor vfork or _Fork (arc4random is not
> -   async-signal-safe due the internal lock usage).  */
> -static void
> -arc4random_init (uint8_t *buf, size_t len)
> -{
> -  state = __mmap (NULL, sizeof (struct arc4random_state),
> -		  PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> -  if (state == MAP_FAILED)
> -    arc4random_allocate_failure ();
> -
> -#ifdef MADV_WIPEONFORK
> -  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
> -  if (r == 0)
> -    __arc4random_wipeonfork = true;
> -  else if (errno != EINVAL)
> -    arc4random_allocate_failure ();
> -#endif
> -
> -  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
> -}
> -
>   #define min(x,y) (((x) > (y)) ? (y) : (x))
>   
>   static void
> -arc4random_rekey (uint8_t *rnd, size_t rndlen)
> +arc4random_rekey (struct arc4random_state *state, uint8_t *rnd, size_t rndlen)
>   {
>     memset (state->buf, 0, sizeof state->buf);
>     chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> @@ -152,41 +104,41 @@ arc4random_getentropy (uint8_t *rnd, size_t len)
>     arc4random_getrandom_failure ();
>   }
>   
> -/* Either allocates the state buffer or reinit it by reseeding the cipher
> -   state with kernel entropy.  */
> -static void
> -arc4random_stir (void)
> +/* Reinit the thread context by reseeding the cipher state with kernel
> +   entropy.  */
> +static struct arc4random_state *
> +arc4random_check_stir (size_t len)
>   {
> -  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> -  arc4random_getentropy (rnd, sizeof rnd);
> +  struct arc4random_state *state = &__glibc_tls_internal()->rnd_state;
>   
> -  if (state == NULL)
> -    arc4random_init (rnd, sizeof rnd);
> -  else
> -    arc4random_rekey (rnd, sizeof rnd);
> +  if (state->count < len || state->count == -1)
> +    {
> +      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> +      arc4random_getentropy (rnd, sizeof rnd);
>   
> -  explicit_bzero (rnd, sizeof rnd);
> +      if (state->count > CHACHE_RESEED_SIZE)
> +	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);

for case state->count == -1, chacha20_init() should be called (first) instead of arc4random_rekey()
as chacha20 context is not setup and the buffer contains no keystream yet

     if (state->count == -1)
         chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);


> +      else
> +	arc4random_rekey (state, rnd, sizeof rnd);
>   
> -  state->have = 0;
> -  memset (state->buf, 0, sizeof state->buf);
> -  state->count = CHACHE_RESEED_SIZE;
> -}
> +      explicit_bzero (rnd, sizeof rnd);
>   
> -static void
> -arc4random_check_stir (size_t len)
> -{
> -  if (state == NULL || state->count < len)
> -    arc4random_stir ();
> +      state->have = 0;
> +      memset (state->buf, 0, sizeof state->buf);
> +      state->count = CHACHE_RESEED_SIZE;
> +    }
>     if (state->count <= len)
>       state->count = 0;
>     else
>       state->count -= len;
> +
> +  return state;
>   }
>   
>   void
> -__arc4random_buf_internal (void *buffer, size_t len)
> +__arc4random_buf (void *buffer, size_t len)
>   {
> -  arc4random_check_stir (len);
> +  struct arc4random_state *state = arc4random_check_stir (len);
>   
>     while (len > 0)
>       {
> @@ -201,29 +153,20 @@ __arc4random_buf_internal (void *buffer, size_t len)
>   	  state->have -= m;
>   	}
>         if (state->have == 0)
> -	arc4random_rekey (NULL, 0);
> +	arc4random_rekey (state, NULL, 0);
>       }
>   }
> -
> -void
> -__arc4random_buf (void *buffer, size_t len)
> -{
> -  __libc_lock_lock (__arc4random_lock);
> -  __arc4random_buf_internal (buffer, len);
> -  __libc_lock_unlock (__arc4random_lock);
> -}
>   libc_hidden_def (__arc4random_buf)
>   weak_alias (__arc4random_buf, arc4random_buf)
>   
> -
> -static uint32_t
> -__arc4random_internal (void)
> +uint32_t
> +__arc4random (void)
>   {
>     uint32_t r;
>   
> -  arc4random_check_stir (sizeof (uint32_t));
> +  struct arc4random_state *state = arc4random_check_stir (sizeof (uint32_t));
>     if (state->have < sizeof (uint32_t))
> -    arc4random_rekey (NULL, 0);
> +    arc4random_rekey (state, NULL, 0);
>     uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
>     memcpy (&r, ks, sizeof (uint32_t));
>     memset (ks, 0, sizeof (uint32_t));
> @@ -231,15 +174,5 @@ __arc4random_internal (void)
>   
>     return r;
>   }
> -
> -uint32_t
> -__arc4random (void)
> -{
> -  uint32_t r;
> -  __libc_lock_lock (__arc4random_lock);
> -  r = __arc4random_internal ();
> -  __libc_lock_unlock (__arc4random_lock);
> -  return r;
> -}
>   libc_hidden_def (__arc4random)
>   weak_alias (__arc4random, arc4random)
> diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
> new file mode 100644
> index 0000000000..40672299d0
> --- /dev/null
> +++ b/stdlib/arc4random.h
> @@ -0,0 +1,45 @@
> +/* Arc4random definition used on TLS.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CHACHA20_H
> +#define _CHACHA20_H
> +
> +#include <stddef.h>
> +#include <stdint.h>
> +
> +/* Internal ChaCha20 state.  */
> +#define CHACHA20_STATE_LEN	16
> +#define CHACHA20_BLOCK_SIZE	64
> +
> +/* Maximum number bytes until reseed (16 MB).  */
> +#define CHACHE_RESEED_SIZE	(16 * 1024 * 1024)
> +
> +/* Internal arc4random buffer, used on each feedback step so offer some
> +   backtracking protection and to allow better used of vectorized
> +   chacha20 implementations.  */
> +#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> +
> +struct arc4random_state
> +{
> +  uint32_t ctx[CHACHA20_STATE_LEN];
> +  size_t have;
> +  size_t count;
> +  uint8_t buf[CHACHA20_BUFSIZE];
> +};
> +
> +#endif
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> index 96ffe62df1..7d0140c375 100644
> --- a/stdlib/arc4random_uniform.c
> +++ b/stdlib/arc4random_uniform.c
> @@ -46,7 +46,7 @@ random_bytes (uint32_t *result, uint32_t byte_count)
>     unsigned char *ptr = (unsigned char *) result;
>     if (__BYTE_ORDER == __BIG_ENDIAN)
>       ptr += 4 - byte_count;
> -  __arc4random_buf_internal (ptr, byte_count);
> +  __arc4random_buf (ptr, byte_count);
>   }
>   
>   static uint32_t
> @@ -142,11 +142,7 @@ __libc_lock_define (extern , __arc4random_lock attribute_hidden)
>   uint32_t
>   __arc4random_uniform (uint32_t upper_bound)
>   {
> -  uint32_t r;
> -  __libc_lock_lock (__arc4random_lock);
> -  r = compute_uniform (upper_bound);
> -  __libc_lock_unlock (__arc4random_lock);
> -  return r;
> +  return compute_uniform (upper_bound);
>   }
>   libc_hidden_def (__arc4random_uniform)
>   weak_alias (__arc4random_uniform, arc4random_uniform)
> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> index fea4994169..0fb55c0fa3 100644
> --- a/stdlib/chacha20.c
> +++ b/stdlib/chacha20.c
> @@ -26,11 +26,8 @@
>   #define CHACHA20_IV_SIZE	16
>   #define CHACHA20_KEY_SIZE	32
>   
> -#define CHACHA20_BLOCK_SIZE     64
>   #define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
>   
> -#define CHACHA20_STATE_LEN	16
> -
>   /* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
>      Sby RFC8439.  Since the input stream will either zero bytes (initial state)
>      or the PRNG output itself this step does not add any extra entropy.   */
> diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c
> index dd0ef6d8ba..614e6e0736 100644
> --- a/stdlib/tst-arc4random-chacha20.c
> +++ b/stdlib/tst-arc4random-chacha20.c
> @@ -16,11 +16,11 @@
>      License along with the GNU C Library; if not, see
>      <http://www.gnu.org/licenses/>.  */
>   
> +#include <arc4random.h>
>   #include <support/check.h>
>   #include <sys/cdefs.h>
>   
>   /* It does not define CHACHA20_XOR_FINAL to check what glibc actual uses. */
> -#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
>   #include <chacha20.c>
>   
>   static int
> diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
> index d76c715a96..5d0e2fba53 100644
> --- a/sysdeps/generic/tls-internal-struct.h
> +++ b/sysdeps/generic/tls-internal-struct.h
> @@ -19,10 +19,13 @@
>   #ifndef _TLS_INTERNAL_STRUCT_H
>   #define _TLS_INTERNAL_STRUCT_H 1
>   
> +#include <stdlib/arc4random.h>
> +
>   struct tls_internal_t
>   {
>     char *strsignal_buf;
>     char *strerror_l_buf;
> +  struct arc4random_state rnd_state;
>   };
>   
>   #endif
> diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
> index f7a1a62135..16ff836d05 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.h
> +++ b/sysdeps/unix/sysv/linux/tls-internal.h
> @@ -22,6 +22,19 @@
>   #include <stdlib.h>
>   #include <pthreadP.h>
>   
> +static inline void
> +__glibc_tls_internal_init (struct tls_internal_t *tls_state)
> +{
> +  tls_state->strsignal_buf = NULL;
> +  tls_state->strerror_l_buf = NULL;
> +
> +  /* Force key init on created threads.  There is no need to clear the
> +     initial state since it will be done either by allocation a new
> +     stack (through mmap with MAP_ANONYMOUS) or by the free function
> +     below).  */
> +  tls_state->rnd_state.count = -1;
> +}
> +
>   static inline struct tls_internal_t *
>   __glibc_tls_internal (void)
>   {
> @@ -31,8 +44,18 @@ __glibc_tls_internal (void)
>   static inline void
>   __glibc_tls_internal_free (void)
>   {
> -  free (THREAD_SELF->tls_state.strsignal_buf);
> -  free (THREAD_SELF->tls_state.strerror_l_buf);
> +  struct pthread *self = THREAD_SELF;
> +  free (self->tls_state.strsignal_buf);
> +  free (self->tls_state.strerror_l_buf);
> +  if (self->tls_state.rnd_state.count != -1)
> +    {
> +      /* Clear any lingering random state prior so if the thread stack
> +	 is cached it won't leak any data.  */
> +      memset (&self->tls_state.rnd_state, 0,
> +	      sizeof self->tls_state.rnd_state);
> +      /* Force key init on created threads.  */
> +      self->tls_state.rnd_state.count = -1;

setting to -1 is probably not needed, as it will be set by the init 
function.


> +    }
>   }
>   
>   #endif

-- 

Yann Droneaud

OPTEYA


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 7/9] powerpc64: Add optimized chacha20
  2022-04-20 19:23     ` Adhemerval Zanella
@ 2022-04-22 21:09       ` Paul E Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E Murphy @ 2022-04-22 21:09 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha



On 4/20/22 2:23 PM, Adhemerval Zanella wrote:
> 
> 
> On 20/04/2022 15:38, Paul E Murphy wrote:
>>
>>
>> On 4/19/22 4:28 PM, Adhemerval Zanella via Libc-alpha wrote:
>>> It adds vectorized ChaCha20 implementation based on libgcrypt
>>> cipher/chacha20-ppc.c.  It targets POWER8 and it is used on
>>> default for LE.
>>
>>> diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
>>> new file mode 100644
>>> index 0000000000..e2567c379a
>>> --- /dev/null
>>> +++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c
>>
>> How difficult is it to keep this synchronized with the upstream version in libgcrypt?  Also, this seems like it would be a better placed in the power8 subdirectory.
> 
> It would be somewhat complicate because libgcrypt also implements the
> poly1305 on the same file (which uses common macros and definition
> for chacha20) and it adds final XOR based on input stream (which
> for arc4random usage is not required since it does not add any
> hardening).
> 
> It would require to refactor libgcrypt code a bit to split the
> chacha and poly1305 and to add a macro to XOR the input.

I think this is OK. Thanks for the explanation.

> 
>>
>>> diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h
>>> new file mode 100644
>>> index 0000000000..a18115392f
>>> --- /dev/null
>>> +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h
>>> @@ -0,0 +1,47 @@
>>> +/* PowerPC optimization for ChaCha20.
>>> +   Copyright (C) 2022 Free Software Foundation, Inc.
>>> +   This file is part of the GNU C Library.
>>> +
>>> +   The GNU C Library is free software; you can redistribute it and/or
>>> +   modify it under the terms of the GNU Lesser General Public
>>> +   License as published by the Free Software Foundation; either
>>> +   version 2.1 of the License, or (at your option) any later version.
>>> +
>>> +   The GNU C Library is distributed in the hope that it will be useful,
>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> +   Lesser General Public License for more details.
>>> +
>>> +   You should have received a copy of the GNU Lesser General Public
>>> +   License along with the GNU C Library; if not, see
>>> +   <http://www.gnu.org/licenses/>.  */
>>> +
>>> +#include <stdbool.h>
>>> +#include <ldsodefs.h>
>>> +
>>> +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst,
>>> +                    const uint8_t *src, size_t nblks)
>>> +     attribute_hidden;
>>> +
>>> +static void
>>> +chacha20_crypt (uint32_t *state, uint8_t *dst,
>>> +        const uint8_t *src, size_t bytes)
>>> +{
>>> +  _Static_assert (CHACHA20_BUFSIZE % 4 == 0,
>>> +          "CHACHA20_BUFSIZE not multiple of 4");
>>> +  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
>>> +          "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
>>> +
>>> +#ifdef __LITTLE_ENDIAN__
>>> +  __chacha20_power8_blocks4 (state, dst, src,
>>> +                 CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
>>> +#else
>>> +  unsigned long int hwcap = GLRO(dl_hwcap);
>>> +  unsigned long int hwcap2 = GLRO(dl_hwcap2);
>>> +  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
>>> +    __chacha20_power8_blocks4 (state, dst, src,
>>> +                   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
>>> +  else
>>> +    chacha20_crypt_generic (state, dst, src, bytes);
>>> +#endif
>>
>> This file doesn't seem to obey the multiarch conventions of other powerpc64 specific bits. Is it possible to implement multiarch support similar to the libc/libm routines?
> 
> I am not very found of the powerpc multiarch convention and it would
> require some more boilerplate code to handle BE, but it is doable.
> 
> So LE will continue to use __chacha20_power8_blocks4 as
> default, while BE will just select if --with-arch=power8 is defined
> for for default build.  With --disable-multi-arch the power8 will be
> select iff --with-arch=power8 is set.
> 
> ---
> 
> diff --git a/sysdeps/powerpc/powerpc64/Makefile b/sysdeps/powerpc/powerpc64/Makefile
> index 18943ef09e..679d5e49ba 100644
> --- a/sysdeps/powerpc/powerpc64/Makefile
> +++ b/sysdeps/powerpc/powerpc64/Makefile
> @@ -66,9 +66,6 @@ tst-setjmp-bug21895-static-ENV = \
>   endif
>   
>   ifeq ($(subdir),stdlib)
> -sysdep_routines += chacha20-ppc
> -CFLAGS-chacha20-ppc.c += -mcpu=power8
> -
>   CFLAGS-tst-ucontext-ppc64-vscr.c += -maltivec
>   tests += tst-ucontext-ppc64-vscr
>   endif
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> new file mode 100644
> index 0000000000..8c75165f7f
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile
> @@ -0,0 +1,4 @@
> +ifeq ($(subdir),stdlib)
> +sysdep_routines += chacha20-ppc
> +CFLAGS-chacha20-ppc.c += -mcpu=power8
> +endif
> diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> new file mode 100644
> index 0000000000..cf9e735326
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
> @@ -0,0 +1 @@
> +#include <sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c>
> diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> similarity index 92%
> rename from sysdeps/powerpc/powerpc64/chacha20_arch.h
> rename to sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> index a18115392f..6d2762d82b 100644
> --- a/sysdeps/powerpc/powerpc64/chacha20_arch.h
> +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
> @@ -32,10 +32,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst,
>     _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4,
>   		  "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4");
>   
> -#ifdef __LITTLE_ENDIAN__
> -  __chacha20_power8_blocks4 (state, dst, src,
> -			     CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
> -#else
>     unsigned long int hwcap = GLRO(dl_hwcap);
>     unsigned long int hwcap2 = GLRO(dl_hwcap2);
>     if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC)
> @@ -43,5 +39,4 @@ chacha20_crypt (uint32_t *state, uint8_t *dst,
>   			       CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE);
>     else
>       chacha20_crypt_generic (state, dst, src, bytes);
> -#endif
>   }
> diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile
> index 71a59529f3..abb0aa3f11 100644
> --- a/sysdeps/powerpc/powerpc64/power8/Makefile
> +++ b/sysdeps/powerpc/powerpc64/power8/Makefile
> @@ -1,3 +1,8 @@
>   ifeq ($(subdir),string)
>   sysdep_routines += strcasestr-ppc64
>   endif
> +
> +ifeq ($(subdir),stdlib)
> +sysdep_routines += chacha20-ppc
> +CFLAGS-chacha20-ppc.c += -mcpu=power8

Is it required to specify mcpu=power8 here?  I am thinking about the 
case of building glibc for power9 (or newer), which could benefit from 
improved instruction selection when using the VSX builtins.

I think this is improved over V3, and seems OK. Thanks. It would be nice 
to refactor the multiarch/multi-cpu code on powerpc, I agree it is not 
ideal in its current implementation.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
  2022-04-19 21:52   ` H.J. Lu
  2022-04-22 13:54   ` Yann Droneaud
@ 2022-04-25  2:22   ` Mark Harris
  2022-04-25 12:26     ` Adhemerval Zanella
  2 siblings, 1 reply; 22+ messages in thread
From: Mark Harris @ 2022-04-25  2:22 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha, Florian Weimer

On Tue, Apr 19, 2022 at 2:29 PM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> The implementation is based on scalar Chacha20, with global cache and
> locking.  It uses getrandom or /dev/urandom as fallback to get the
> initial entropy, and reseeds the internal state on every 16MB of
> consumed buffer.
>
> It maintains an internal buffer which consumes at maximum one page on
> most systems (assuming minimum of 4k pages).  The internal buf optimizes
> the cipher encrypt calls, by amortize arc4random calls (where both

s/amortize/amortizing/

> function call and locks cost are the dominating factor).

s/locks/lock/

>
> The ChaCha20 implementation is based on the RFC8439 [1], with last
> step that XOR with the input omited.  Since the input stream will either
> zero bytes (initial state) or the PRNG output itself this step does not
> add any extra entropy.

The src argument to chacha20_crypt is always zeros, never PRNG output.
Perhaps it would be clearer to say something like this:

The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros.

>
> The arc4random_uniform is based on previous work by Florian Weimer.
>
> Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
>
> Co-authored-by: Florian Weimer <fweimer@redhat.com>
>
> [1] https://datatracker.ietf.org/doc/html/rfc8439
> ---
>  NEWS                                          |   4 +-
>  include/stdlib.h                              |  13 +
>  posix/fork.c                                  |   2 +
>  stdlib/Makefile                               |   2 +
>  stdlib/Versions                               |   5 +
>  stdlib/arc4random.c                           | 245 ++++++++++++++++++
>  stdlib/arc4random_uniform.c                   | 152 +++++++++++
>  stdlib/chacha20.c                             | 163 ++++++++++++
>  stdlib/stdlib.h                               |  14 +
>  sysdeps/generic/not-cancel.h                  |   2 +
>  sysdeps/mach/hurd/i386/libc.abilist           |   3 +
>  sysdeps/mach/hurd/not-cancel.h                |   3 +
>  sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   3 +
>  sysdeps/unix/sysv/linux/alpha/libc.abilist    |   3 +
>  sysdeps/unix/sysv/linux/arc/libc.abilist      |   3 +
>  sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   3 +
>  sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   3 +
>  sysdeps/unix/sysv/linux/csky/libc.abilist     |   3 +
>  sysdeps/unix/sysv/linux/hppa/libc.abilist     |   3 +
>  sysdeps/unix/sysv/linux/i386/libc.abilist     |   3 +
>  sysdeps/unix/sysv/linux/ia64/libc.abilist     |   3 +
>  .../sysv/linux/m68k/coldfire/libc.abilist     |   3 +
>  .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   3 +
>  .../sysv/linux/microblaze/be/libc.abilist     |   3 +
>  .../sysv/linux/microblaze/le/libc.abilist     |   3 +
>  .../sysv/linux/mips/mips32/fpu/libc.abilist   |   3 +
>  .../sysv/linux/mips/mips32/nofpu/libc.abilist |   3 +
>  .../sysv/linux/mips/mips64/n32/libc.abilist   |   3 +
>  .../sysv/linux/mips/mips64/n64/libc.abilist   |   3 +
>  sysdeps/unix/sysv/linux/nios2/libc.abilist    |   3 +
>  sysdeps/unix/sysv/linux/not-cancel.h          |   7 +
>  sysdeps/unix/sysv/linux/or1k/libc.abilist     |   3 +
>  .../linux/powerpc/powerpc32/fpu/libc.abilist  |   3 +
>  .../powerpc/powerpc32/nofpu/libc.abilist      |   3 +
>  .../linux/powerpc/powerpc64/be/libc.abilist   |   3 +
>  .../linux/powerpc/powerpc64/le/libc.abilist   |   3 +
>  .../unix/sysv/linux/riscv/rv32/libc.abilist   |   3 +
>  .../unix/sysv/linux/riscv/rv64/libc.abilist   |   3 +
>  .../unix/sysv/linux/s390/s390-32/libc.abilist |   3 +
>  .../unix/sysv/linux/s390/s390-64/libc.abilist |   3 +
>  sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   3 +
>  sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   3 +
>  .../sysv/linux/sparc/sparc32/libc.abilist     |   3 +
>  .../sysv/linux/sparc/sparc64/libc.abilist     |   3 +
>  .../unix/sysv/linux/x86_64/64/libc.abilist    |   3 +
>  .../unix/sysv/linux/x86_64/x32/libc.abilist   |   3 +
>  46 files changed, 713 insertions(+), 1 deletion(-)
>  create mode 100644 stdlib/arc4random.c
>  create mode 100644 stdlib/arc4random_uniform.c
>  create mode 100644 stdlib/chacha20.c
>
> diff --git a/NEWS b/NEWS
> index 4b6d9de2b5..4d9d95b35b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,9 @@ Version 2.36
>
>  Major new features:
>
> -  [Add new features here]
> +* The functions arc4random, arc4random_buf, arc4random_uniform have been
> +  added.  The functions use a cryptographic pseudo-random number generator
> +  based on ChaCha20 initilized with entropy from kernel.
>
>  Deprecated and removed features, and other changes affecting compatibility:
>
> diff --git a/include/stdlib.h b/include/stdlib.h
> index 1c6f70b082..055f9d2965 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -144,6 +144,19 @@ libc_hidden_proto (__ptsname_r)
>  libc_hidden_proto (grantpt)
>  libc_hidden_proto (unlockpt)
>
> +__typeof (arc4random) __arc4random;
> +libc_hidden_proto (__arc4random);
> +__typeof (arc4random_buf) __arc4random_buf;
> +libc_hidden_proto (__arc4random_buf);
> +__typeof (arc4random_uniform) __arc4random_uniform;
> +libc_hidden_proto (__arc4random_uniform);
> +extern void __arc4random_buf_internal (void *buffer, size_t len)
> +     attribute_hidden;
> +/* Called from the fork function to reinitialize the internal lock in thte

s/thte/the/

> +   child process.  This avoids deadlocks if fork is called in multi-threaded
> +   processes.  */
> +extern void __arc4random_fork_subprocess (void) attribute_hidden;
> +
>  extern double __strtod_internal (const char *__restrict __nptr,
>                                  char **__restrict __endptr, int __group)
>       __THROW __nonnull ((1)) __wur;
> diff --git a/posix/fork.c b/posix/fork.c
> index 6b50c091f9..87d8329b46 100644
> --- a/posix/fork.c
> +++ b/posix/fork.c
> @@ -96,6 +96,8 @@ __libc_fork (void)
>                                      &nss_database_data);
>         }
>
> +      call_function_static_weak (__arc4random_fork_subprocess);
> +
>        /* Reset the lock the dynamic loader uses to protect its data.  */
>        __rtld_lock_initialize (GL(dl_load_lock));
>
> diff --git a/stdlib/Makefile b/stdlib/Makefile
> index 60fc59c12c..9f9cc1bd7f 100644
> --- a/stdlib/Makefile
> +++ b/stdlib/Makefile
> @@ -53,6 +53,8 @@ routines := \
>    a64l \
>    abort \
>    abs \
> +  arc4random \
> +  arc4random_uniform \
>    at_quick_exit \
>    atof \
>    atoi \
> diff --git a/stdlib/Versions b/stdlib/Versions
> index 5e9099a153..d09a308fb5 100644
> --- a/stdlib/Versions
> +++ b/stdlib/Versions
> @@ -136,6 +136,11 @@ libc {
>      strtof32; strtof64; strtof32x;
>      strtof32_l; strtof64_l; strtof32x_l;
>    }
> +  GLIBC_2.36 {
> +    arc4random;
> +    arc4random_buf;
> +    arc4random_uniform;
> +  }
>    GLIBC_PRIVATE {
>      # functions which have an additional interface since they are
>      # are cancelable.
> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> new file mode 100644
> index 0000000000..cddb0e405a
> --- /dev/null
> +++ b/stdlib/arc4random.c
> @@ -0,0 +1,245 @@
> +/* Pseudo Random Number Generator based on ChaCha20.
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <libc-lock.h>
> +#include <not-cancel.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/mman.h>
> +#include <sys/param.h>
> +#include <sys/random.h>
> +
> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
> +   number of bytes until a reseed.
> +
> +   Both the initial seed an reseed tries to obtain entropy from the kernel
> +   and abort the process if none could be obtained.
> +
> +   The state 'buf' improves the usage of the cipher call, allowing to call
> +   optimized implementations (if the archictecture provides it) and optimize
> +   arc4random calls (since only multiple call it will encrypt the next block).
> + */
> +
> +/* Maximum number bytes until reseed (16 MB).  */
> +#define CHACHE_RESEED_SIZE     (16 * 1024 * 1024)

Should this be CHACHA20_RESEED_SIZE?

> +/* Internal buffer size in bytes (1KB).  */
> +#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)

8 * 64 = 512; should this be (16 * CHACHA20_BLOCK_SIZE)?

> +
> +#include <chacha20.c>
> +
> +static struct arc4random_state
> +{
> +  uint32_t ctx[CHACHA20_STATE_LEN];
> +  size_t have;
> +  size_t count;
> +  uint8_t buf[CHACHA20_BUFSIZE];
> +} *state;
> +
> +/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus
> +   it does not require to clear the internal state.  */
> +static bool __arc4random_wipeonfork = false;
> +
> +__libc_lock_define_initialized (, __arc4random_lock);
> +
> +/* Called from the fork function to reset the state if MADV_WIPEONFORK is
> +   not supported and to reinit the internal lock.  */
> +void
> +__arc4random_fork_subprocess (void)
> +{
> +  if (__arc4random_wipeonfork && state != NULL)
> +    memset (state, 0, sizeof (struct arc4random_state));
> +
> +  __libc_lock_init (__arc4random_lock);
> +}
> +
> +static void
> +arc4random_allocate_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n");
> +}
> +
> +static void
> +arc4random_getrandom_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n");
> +}
> +
> +/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
> +   the fork callback will reset the state on the fork call.  It does not
> +   handle direct clone calls, nor vfork or _Fork (arc4random is not
> +   async-signal-safe due the internal lock usage).  */
> +static void
> +arc4random_init (uint8_t *buf, size_t len)

len is not used in this function.

> +{
> +  state = __mmap (NULL, sizeof (struct arc4random_state),
> +                 PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +  if (state == MAP_FAILED)
> +    arc4random_allocate_failure ();
> +
> +#ifdef MADV_WIPEONFORK
> +  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
> +  if (r == 0)
> +    __arc4random_wipeonfork = true;
> +  else if (errno != EINVAL)
> +    arc4random_allocate_failure ();
> +#endif
> +
> +  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
> +}
> +
> +#define min(x,y) (((x) > (y)) ? (y) : (x))
> +
> +static void
> +arc4random_rekey (uint8_t *rnd, size_t rndlen)
> +{
> +  memset (state->buf, 0, sizeof state->buf);
> +  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> +
> +  /* Mix some extra entropy if provided.  */
> +  if (rnd != NULL)
> +    {
> +      size_t m = min (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +      for (size_t i = 0; i < m; i++)
> +       state->buf[i] ^= rnd[i];
> +    }
> +
> +  /* Immediately reinit for backtracking resistance.  */
> +  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> +  memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +}
> +
> +static void
> +arc4random_getentropy (uint8_t *rnd, size_t len)
> +{
> +  if (__getrandomn_nocancel (rnd, len, GRND_NONBLOCK) == len)
> +    return;
> +
> +  int fd = __open64_nocancel ("/dev/urandom", O_RDONLY);

Should this be O_RDONLY | O_CLOEXEC?

> +  if (fd != -1)
> +    {
> +      unsigned char *p = rnd;
> +      unsigned char *end = p + len;

uint8_t * would be consistent with the declaration of md.

> +      do
> +       {
> +         ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> +         if (ret <= 0)
> +           arc4random_getrandom_failure ();
> +         p += ret;
> +       }
> +      while (p < end);
> +
> +      if (__close_nocancel (fd) != 0)

Should this be == 0?

> +       return;
> +    }
> +  arc4random_getrandom_failure ();
> +}
> +
> +/* Either allocates the state buffer or reinit it by reseeding the cipher
> +   state with kernel entropy.  */
> +static void
> +arc4random_stir (void)
> +{
> +  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> +  arc4random_getentropy (rnd, sizeof rnd);
> +
> +  if (state == NULL)
> +    arc4random_init (rnd, sizeof rnd);
> +  else
> +    arc4random_rekey (rnd, sizeof rnd);
> +
> +  explicit_bzero (rnd, sizeof rnd);
> +
> +  state->have = 0;
> +  memset (state->buf, 0, sizeof state->buf);
> +  state->count = CHACHE_RESEED_SIZE;
> +}
> +
> +static void
> +arc4random_check_stir (size_t len)
> +{
> +  if (state == NULL || state->count < len)
> +    arc4random_stir ();
> +  if (state->count <= len)
> +    state->count = 0;
> +  else
> +    state->count -= len;
> +}
> +
> +void
> +__arc4random_buf_internal (void *buffer, size_t len)
> +{
> +  arc4random_check_stir (len);
> +
> +  while (len > 0)
> +    {
> +      if (state->have > 0)
> +       {
> +         size_t m = min (len, state->have);
> +         uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +         memcpy (buffer, ks, m);
> +         memset (ks, 0, m);
> +         buffer += m;
> +         len -= m;
> +         state->have -= m;
> +       }
> +      if (state->have == 0)
> +       arc4random_rekey (NULL, 0);
> +    }
> +}
> +
> +void
> +__arc4random_buf (void *buffer, size_t len)
> +{
> +  __libc_lock_lock (__arc4random_lock);
> +  __arc4random_buf_internal (buffer, len);
> +  __libc_lock_unlock (__arc4random_lock);
> +}
> +libc_hidden_def (__arc4random_buf)
> +weak_alias (__arc4random_buf, arc4random_buf)
> +
> +
> +static uint32_t
> +__arc4random_internal (void)
> +{
> +  uint32_t r;
> +
> +  arc4random_check_stir (sizeof (uint32_t));
> +  if (state->have < sizeof (uint32_t))
> +    arc4random_rekey (NULL, 0);
> +  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +  memcpy (&r, ks, sizeof (uint32_t));
> +  memset (ks, 0, sizeof (uint32_t));
> +  state->have -= sizeof (uint32_t);
> +
> +  return r;
> +}
> +
> +uint32_t
> +__arc4random (void)
> +{
> +  uint32_t r;
> +  __libc_lock_lock (__arc4random_lock);
> +  r = __arc4random_internal ();
> +  __libc_lock_unlock (__arc4random_lock);
> +  return r;
> +}
> +libc_hidden_def (__arc4random)
> +weak_alias (__arc4random, arc4random)
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> new file mode 100644
> index 0000000000..96ffe62df1
> --- /dev/null
> +++ b/stdlib/arc4random_uniform.c
> @@ -0,0 +1,152 @@
> +/* Random pseudo generator numbers between 0 and 2**-31 (inclusive)
> +   uniformly distributed but with an upper_bound.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <endian.h>
> +#include <libc-lock.h>
> +#include <stdlib.h>
> +#include <sys/param.h>
> +
> +/* Return the number of bytes which cover values up to the limit.  */
> +__attribute__ ((const))
> +static uint32_t
> +byte_count (uint32_t n)
> +{
> +  if (n <= (1U << 8))
> +    return 1;
> +  else if (n <= (1U << 16))
> +    return 2;
> +  else if (n <= (1U << 24))
> +    return 3;
> +  else
> +    return 4;
> +}
> +
> +/* Fill the lower bits of the result with randomness, according to the
> +   number of bytes requested.  */
> +static void
> +random_bytes (uint32_t *result, uint32_t byte_count)
> +{
> +  *result = 0;
> +  unsigned char *ptr = (unsigned char *) result;
> +  if (__BYTE_ORDER == __BIG_ENDIAN)
> +    ptr += 4 - byte_count;
> +  __arc4random_buf_internal (ptr, byte_count);
> +}
> +
> +static uint32_t
> +compute_uniform (uint32_t n)
> +{
> +  if (n <= 1)
> +    /* There is no valid return value for a zero limit, and 0 is the
> +       only possible result for limit 1.  */
> +    return 0;
> +
> +  /* The bits variable serves as a source for bits.  Prefetch the
> +     minimum number of bytes needed.  */
> +  unsigned count = byte_count (n);

uint32_t would be consistent with the declaration of byte_count.

> +  uint32_t bits_length = count * CHAR_BIT;
> +  uint32_t bits;
> +  random_bytes (&bits, count);
> +
> +  /* Powers of two are easy.  */
> +  if (powerof2 (n))
> +    return bits & (n - 1);
> +
> +  /* The general case.  This algorithm follows Jérémie Lumbroso,
> +     Optimal Discrete Uniform Generation from Coin Flips, and
> +     Applications (2013), who credits Donald E. Knuth and Andrew
> +     C. Yao, The complexity of nonuniform random number generation
> +     (1976), for solving the general case.
> +
> +     The implementation below unrolls the initialization stage of the
> +     loop, where v is less than n.  */
> +
> +  /* Use 64-bit variables even though the intermediate results are
> +     never larger that 33 bits.  This ensures the code easier to

s/that/than/
s/the code/that the code is/

> +     compile on 64-bit architectures.  */
> +  uint64_t v;
> +  uint64_t c;
> +
> +  /* Initialize v and c.  v is the smallest power of 2 which is larger
> +     than n.*/
> +  {
> +    uint32_t log2p1 = 32 - __builtin_clz (n);
> +    v = 1ULL << log2p1;
> +    c = bits & (v - 1);
> +    bits >>= log2p1;
> +    bits_length -= log2p1;
> +  }
> +
> +  /* At the start of the loop, c is uniformly distributed within the
> +     half-open interval [0, v), and v < 2n < 2**33.  */
> +  while (true)
> +    {
> +      if (v >= n)
> +        {
> +          /* If the candidate is less than n, accept it.  */
> +          if (c < n)
> +            /* c is uniformly distributed on [0, n).  */
> +            return c;
> +          else
> +            {
> +              /* c is uniformly distributed on [n, v).  */
> +              v -= n;
> +              c -= n;
> +              /* The distribution was shifted, so c is uniformly
> +                 distributed on [0, v) again.  */
> +            }
> +        }
> +      /* v < n here.  */
> +
> +      /* Replenish the bit source if necessary.  */
> +      if (bits_length == 0)
> +        {
> +          /* Overwrite the least significant byte.  */
> +         random_bytes (&bits, 1);
> +         bits_length = CHAR_BIT;
> +        }
> +
> +      /* Double the range.  No overflow because v < n < 2**32.  */
> +      v *= 2;
> +      /* v < 2n here.  */
> +
> +      /* Extract a bit and append it to c.  c remains less than v and
> +         thus 2**33.  */
> +      c = (c << 1) | (bits & 1);
> +      bits >>= 1;
> +      --bits_length;
> +
> +      /* At this point, c is uniformly distributed on [0, v) again,
> +         and v < 2n < 2**33.  */
> +    }
> +}
> +
> +__libc_lock_define (extern , __arc4random_lock attribute_hidden)
> +
> +uint32_t
> +__arc4random_uniform (uint32_t upper_bound)
> +{
> +  uint32_t r;
> +  __libc_lock_lock (__arc4random_lock);
> +  r = compute_uniform (upper_bound);
> +  __libc_lock_unlock (__arc4random_lock);
> +  return r;
> +}
> +libc_hidden_def (__arc4random_uniform)
> +weak_alias (__arc4random_uniform, arc4random_uniform)
> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> new file mode 100644
> index 0000000000..af4ffa9860
> --- /dev/null
> +++ b/stdlib/chacha20.c
> @@ -0,0 +1,163 @@
> +/* Generic ChaCha20 implementation (used on arc4random).
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <array_length.h>
> +#include <endian.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <string.h>
> +
> +/* 32-bit stream position, then 96-bit nonce.  */
> +#define CHACHA20_IV_SIZE       16
> +#define CHACHA20_KEY_SIZE      32
> +
> +#define CHACHA20_BLOCK_SIZE     64
> +#define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
> +
> +#define CHACHA20_STATE_LEN     16
> +
> +/* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
> +   Sby RFC8439.  Since the input stream will either zero bytes (initial state)

s/Sby/by/

> +   or the PRNG output itself this step does not add any extra entropy.   */

The plaintext input stream (src argument to chacha20_crypt) is always
zeros, never PRNG output.

> +
> +enum chacha20_constants
> +{
> +  CHACHA20_CONSTANT_EXPA = 0x61707865U,
> +  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
> +  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
> +  CHACHA20_CONSTANT_TE_K = 0x6b206574U
> +};
> +
> +static inline uint32_t
> +read_unaligned_32 (const uint8_t *p)
> +{
> +  uint32_t r;
> +  memcpy (&r, p, sizeof (r));
> +  return r;
> +}
> +
> +static inline void
> +write_unaligned_32 (uint8_t *p, uint32_t v)
> +{
> +  memcpy (p, &v, sizeof (v));
> +}
> +
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
> +# define set_state(v)          __builtin_bswap32 ((v))
> +#else
> +# define read_unaligned_le32(p) read_unaligned_32 ((p))
> +# define set_state(v)          (v)
> +#endif
> +
> +static inline void
> +chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
> +{
> +  state[0]  = CHACHA20_CONSTANT_EXPA;
> +  state[1]  = CHACHA20_CONSTANT_ND_3;
> +  state[2]  = CHACHA20_CONSTANT_2_BY;
> +  state[3]  = CHACHA20_CONSTANT_TE_K;
> +
> +  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
> +  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
> +  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
> +  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
> +  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
> +  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
> +  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
> +  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
> +
> +  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
> +  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
> +  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
> +  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
> +}
> +
> +static inline uint32_t
> +rotl32 (unsigned int shift, uint32_t word)
> +{
> +  return (word << (shift & 31)) | (word >> ((-shift) & 31));
> +}
> +
> +#define QROUND(x0, x1, x2, x3)                         \
> +  do {                                         \
> +   x0 = x0 + x1; x3 = rotl32 (16, (x0 ^ x3));  \
> +   x2 = x2 + x3; x1 = rotl32 (12, (x1 ^ x2));  \
> +   x0 = x0 + x1; x3 = rotl32 (8,  (x0 ^ x3));  \
> +   x2 = x2 + x3; x1 = rotl32 (7,  (x1 ^ x2));  \
> +  } while(0)
> +
> +static inline void
> +chacha20_block (uint32_t *state, uint32_t *stream)
> +{
> +  uint32_t x[CHACHA20_STATE_LEN];
> +  memcpy (x, state, sizeof x);
> +
> +  for (int i = 0; i < 20; i += 2)
> +    {
> +      QROUND (x[0], x[4], x[8],  x[12]);
> +      QROUND (x[1], x[5], x[9],  x[13]);
> +      QROUND (x[2], x[6], x[10], x[14]);
> +      QROUND (x[3], x[7], x[11], x[15]);
> +
> +      QROUND (x[0], x[5], x[10], x[15]);
> +      QROUND (x[1], x[6], x[11], x[12]);
> +      QROUND (x[2], x[7], x[8],  x[13]);
> +      QROUND (x[3], x[4], x[9],  x[14]);
> +    }
> +
> +  /* Unroll the loop a bit.  */
> +  for (int i = 0; i < CHACHA20_BLOCK_WORDS / 4; i++)
> +    {
> +      stream[i*4+0] = set_state (x[i*4+0] + state[i*4+0]);
> +      stream[i*4+1] = set_state (x[i*4+1] + state[i*4+1]);
> +      stream[i*4+2] = set_state (x[i*4+2] + state[i*4+2]);
> +      stream[i*4+3] = set_state (x[i*4+3] + state[i*4+3]);
> +    }
> +
> +  state[12]++;
> +}
> +
> +static void
> +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
> +               size_t bytes)
> +{
> +  uint32_t stream[CHACHA20_BLOCK_WORDS];
> +
> +  while (bytes >= CHACHA20_BLOCK_SIZE)
> +    {
> +      chacha20_block (state, stream);
> +#ifdef CHACHA20_XOR_FINAL
> +      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
> +       stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
> +#endif
> +      memcpy (dst, stream, CHACHA20_BLOCK_SIZE);
> +      bytes -= CHACHA20_BLOCK_SIZE;
> +      dst += CHACHA20_BLOCK_SIZE;
> +      src += CHACHA20_BLOCK_SIZE;
> +    }
> +  if (bytes != 0)
> +    {
> +      chacha20_block (state, stream);
> +#ifdef CHACHA20_XOR_FINAL
> +      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
> +       stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
> +#endif
> +      memcpy (dst, stream, bytes);
> +    }
> +}
> diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
> index bf7cd438e1..f2b0c83c12 100644
> --- a/stdlib/stdlib.h
> +++ b/stdlib/stdlib.h
> @@ -485,6 +485,7 @@ extern unsigned short int *seed48 (unsigned short int __seed16v[3])
>  extern void lcong48 (unsigned short int __param[7]) __THROW __nonnull ((1));
>
>  # ifdef __USE_MISC
> +#  include <bits/stdint-uintn.h>
>  /* Data structure for communication with thread safe versions.  This
>     type is to be regarded as opaque.  It's only exported because users
>     have to allocate objects of this type.  */
> @@ -533,6 +534,19 @@ extern int seed48_r (unsigned short int __seed16v[3],
>  extern int lcong48_r (unsigned short int __param[7],
>                       struct drand48_data *__buffer)
>       __THROW __nonnull ((1, 2));
> +
> +/* Return a random integer between zero and 2**31-1 (inclusive).  */

2**32-1


> +extern uint32_t arc4random (void)
> +     __THROW __wur;
> +
> +/* Fill the buffer with random data.  */
> +extern void arc4random_buf (void *__buf, size_t __size)
> +     __THROW __nonnull ((1));
> +
> +/* Return a random number between zero (inclusive) and the specified
> +   limit (exclusive).  */
> +extern uint32_t arc4random_uniform (uint32_t __upper_bound)
> +     __THROW __wur;
>  # endif        /* Use misc.  */
>  #endif /* Use misc or X/Open.  */
>
> diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h
> index 2104efeb54..f4882a9ffd 100644
> --- a/sysdeps/generic/not-cancel.h
> +++ b/sysdeps/generic/not-cancel.h
> @@ -48,5 +48,7 @@
>    (void) __writev (fd, iov, n)
>  #define __fcntl64_nocancel(fd, cmd, ...) \
>    __fcntl64 (fd, cmd, __VA_ARGS__)
> +#define __getrandomn_nocancel(buf, size, flags) \
> +  __getrandom (buf, size, flags)
>
>  #endif /* NOT_CANCEL_H  */
> diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
> index 4dc87e9061..7bd565103b 100644
> --- a/sysdeps/mach/hurd/i386/libc.abilist
> +++ b/sysdeps/mach/hurd/i386/libc.abilist
> @@ -2289,6 +2289,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 close_range F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h
> index 6ec92ced84..39edfe76b6 100644
> --- a/sysdeps/mach/hurd/not-cancel.h
> +++ b/sysdeps/mach/hurd/not-cancel.h
> @@ -74,6 +74,9 @@ __typeof (__fcntl) __fcntl_nocancel;
>  #define __fcntl64_nocancel(...) \
>    __fcntl_nocancel (__VA_ARGS__)
>
> +#define __getrandomn_nocancel(buf, size, flags) \
> +  __getrandom (buf, size, flags)
> +
>  #if IS_IN (libc)
>  hidden_proto (__close_nocancel)
>  hidden_proto (__close_nocancel_nostatus)
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> index 1b63d9e447..f8f38bb205 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> @@ -2616,3 +2616,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> index e7e4cf7d2a..9de1726de0 100644
> --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> @@ -2713,6 +2713,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
> index bc3d228e31..16e2532838 100644
> --- a/sysdeps/unix/sysv/linux/arc/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
> @@ -2377,3 +2377,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> index db7039c4ab..ae9e465088 100644
> --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> @@ -496,6 +496,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>  GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> index d2add4fb49..b669f43194 100644
> --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> @@ -493,6 +493,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>  GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
> index 355d72a30c..42daa90248 100644
> --- a/sysdeps/unix/sysv/linux/csky/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
> @@ -2652,3 +2652,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> index 3df39bb28c..090be20f53 100644
> --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> @@ -2601,6 +2601,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
> index c4da358f80..6b7cf064bb 100644
> --- a/sysdeps/unix/sysv/linux/i386/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
> @@ -2785,6 +2785,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> index 241bac70ea..3e766f64dd 100644
> --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> @@ -2551,6 +2551,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> index 78bf372b72..c0b99199a8 100644
> --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> @@ -497,6 +497,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _Exit F
>  GLIBC_2.4 _IO_2_1_stderr_ D 0x98
>  GLIBC_2.4 _IO_2_1_stdin_ D 0x98
> diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> index 00df5c901f..4d0be7c86d 100644
> --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> @@ -2728,6 +2728,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> index e8118569c3..b944680ede 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> @@ -2701,3 +2701,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> index c0d2373e64..28f7d19983 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> @@ -2698,3 +2698,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> index 2d0fd04f54..3da7cdaca5 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> @@ -2693,6 +2693,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> index e39ccfb312..9fe87f15be 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> @@ -2691,6 +2691,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> index 1e900f86e4..c14fca2111 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> @@ -2699,6 +2699,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> index 9145ba7931..a363830226 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> @@ -2602,6 +2602,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> index e95d60d926..89b6f98667 100644
> --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> @@ -2740,3 +2740,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h
> index 75b9e0ee1e..be5df35927 100644
> --- a/sysdeps/unix/sysv/linux/not-cancel.h
> +++ b/sysdeps/unix/sysv/linux/not-cancel.h
> @@ -67,6 +67,13 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt)
>    INTERNAL_SYSCALL_CALL (writev, fd, iov, iovcnt);
>  }
>
> +static inline int
> +__getrandomn_nocancel (void *buf, size_t buflen, unsigned int flags)
> +{
> +  return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags);
> +}
> +
> +
>  /* Uncancelable fcntl.  */
>  __typeof (__fcntl) __fcntl64_nocancel;
>
> diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> index ca934e374b..94c0ff9526 100644
> --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> @@ -2123,3 +2123,6 @@ GLIBC_2.35 wprintf F
>  GLIBC_2.35 write F
>  GLIBC_2.35 writev F
>  GLIBC_2.35 wscanf F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> index 3820b9f235..d6188de00b 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> @@ -2755,6 +2755,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> index 464dc27fcd..8201230059 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> @@ -2788,6 +2788,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> index 2f7e58747f..623505d783 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> @@ -2510,6 +2510,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> index 4f3043d913..23b0d83408 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> @@ -2812,3 +2812,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> index 84b6ac815a..a72e8ed9cc 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> @@ -2379,3 +2379,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> index 4d5c19c56a..f3faecc2ae 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> @@ -2579,3 +2579,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> index 7c5ee8d569..105e5a9231 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> @@ -2753,6 +2753,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> index 50de0b46cf..c08c6c8301 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> @@ -2547,6 +2547,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> index 66fba013ca..8ec1005644 100644
> --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> @@ -2608,6 +2608,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> index 38703f8aa0..5d776576f9 100644
> --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> @@ -2605,6 +2605,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> index 6df55eb765..f5f07f612e 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> @@ -2748,6 +2748,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 _IO_fprintf F
>  GLIBC_2.4 _IO_printf F
>  GLIBC_2.4 _IO_sprintf F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> index b90569d881..be687ebe02 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> @@ -2574,6 +2574,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> index e88b0f101f..7f456fbb55 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> @@ -2525,6 +2525,9 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
>  GLIBC_2.4 __confstr_chk F
>  GLIBC_2.4 __fgets_chk F
>  GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> index e0755272eb..c737201248 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> @@ -2631,3 +2631,6 @@ GLIBC_2.35 __memcmpeq F
>  GLIBC_2.35 _dl_find_object F
>  GLIBC_2.35 epoll_pwait2 F
>  GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 arc4random F
> +GLIBC_2.36 arc4random_buf F
> +GLIBC_2.36 arc4random_uniform F
> --
> 2.32.0

 - Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-22 13:54   ` Yann Droneaud
@ 2022-04-25 12:15     ` Adhemerval Zanella
  2022-04-25 12:20       ` Adhemerval Zanella
  0 siblings, 1 reply; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-25 12:15 UTC (permalink / raw)
  To: Yann Droneaud, libc-alpha; +Cc: Florian Weimer



On 22/04/2022 10:54, Yann Droneaud wrote:
> Le 19/04/2022 à 23:28, Adhemerval Zanella via Libc-alpha a écrit :
>> The implementation is based on scalar Chacha20, with global cache and
>> locking.  It uses getrandom or /dev/urandom as fallback to get the
>> initial entropy, and reseeds the internal state on every 16MB of
>> consumed buffer.
>>
>> It maintains an internal buffer which consumes at maximum one page on
>> most systems (assuming minimum of 4k pages).  The internal buf optimizes
>> the cipher encrypt calls, by amortize arc4random calls (where both
>> function call and locks cost are the dominating factor).
>>
>> The ChaCha20 implementation is based on the RFC8439 [1], with last
>> step that XOR with the input omited.  Since the input stream will either
>> zero bytes (initial state) or the PRNG output itself this step does not
>> add any extra entropy.
> 
> 
> This can also state the implementation is following OpenBSD arc4random current implementation.
> 
> 

Agree, it is worth to add it.

>> +
>> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
>> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
>> +   number of bytes until a reseed.
>> +
>> +   Both the initial seed an reseed tries to obtain entropy from the kernel
> 
> an ->  and

Ack.

>> +
>> +static void
>> +arc4random_rekey (uint8_t *rnd, size_t rndlen)
>> +{
>> +  memset (state->buf, 0, sizeof state->buf);
> 
> There's no need to clear buf as call to chacha20_crypt() will overwrite it (since it doesn't XOR with it anymore).
> 
> See https://github.com/openbsd/src/blob/master/lib/libc/crypt/arc4random.c#L121

Ack, I have removed it.


>> +
>> +      /* Extract a bit and append it to c.  c remains less than v and
>> +         thus 2**33.  */
>> +      c = (c << 1) | (bits & 1);
>> +      bits >>= 1;
>> +      --bits_length;
>> +
>> +      /* At this point, c is uniformly distributed on [0, v) again,
>> +         and v < 2n < 2**33.  */
>> +    }
> 
> I'm not familiar with this method.
> 
> It's not one reviewed at https://www.pcg-random.org/posts/bounded-rands.html
> 
> In this patch I used what's called by PCG author,the "Bitmask with Rejection (Unbiased) — Apple's Method"
> 
> https://github.com/Parrot-Developers/libfutils/commit/9dc7243ae2f2059b4590a702be2ca9c03578067f
> 
> I like it because it doesn't uses modulo at all :)
> 
> But the OpenBSD's arc4random_uniform() is even more simple in term of C code.
> 

You can find the reference paper on arxiv [1].  The main advantage of this
method is the that the unit of randomness is not the uniform random variable
(uint32_t), but a random bit.  It optimizes the internal buffer sampling by
initially consuming a 32-bit random variable and then sampling byte per byte.
Depending of the upper bound requested, it might lead to better CPU utilization.

From the article:

  But unexpectedly, it turns out that the extra buffering inherent in consuming
  randomness random-bit-by-random-bit, although time consuming, is more than
  compensated by the increased efficiency in using random bits compared with most
  common methods.

It is specially true if you consider that both chacha20 block generation and
getting kernel entropy (through either getrandom or /dev/urandom) are way
more time consuming than bit twiddling. 

[1] https://arxiv.org/pdf/1304.1916.pdf

> 
>> +}
>> +
>> +__libc_lock_define (extern , __arc4random_lock attribute_hidden)
>> +
>> +uint32_t
>> +__arc4random_uniform (uint32_t upper_bound)
>> +{
>> +  uint32_t r;
>> +  __libc_lock_lock (__arc4random_lock);
>> +  r = compute_uniform (upper_bound);
>> +  __libc_lock_unlock (__arc4random_lock);
>> +  return r;
>> +}
>> +libc_hidden_def (__arc4random_uniform)
>> +weak_alias (__arc4random_uniform, arc4random_uniform)
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-25 12:15     ` Adhemerval Zanella
@ 2022-04-25 12:20       ` Adhemerval Zanella
  0 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-25 12:20 UTC (permalink / raw)
  To: Yann Droneaud, libc-alpha; +Cc: Florian Weimer



On 25/04/2022 09:15, Adhemerval Zanella wrote:
> 
> 
> On 22/04/2022 10:54, Yann Droneaud wrote:
>> Le 19/04/2022 à 23:28, Adhemerval Zanella via Libc-alpha a écrit :
>>> The implementation is based on scalar Chacha20, with global cache and
>>> locking.  It uses getrandom or /dev/urandom as fallback to get the
>>> initial entropy, and reseeds the internal state on every 16MB of
>>> consumed buffer.
>>>
>>> It maintains an internal buffer which consumes at maximum one page on
>>> most systems (assuming minimum of 4k pages).  The internal buf optimizes
>>> the cipher encrypt calls, by amortize arc4random calls (where both
>>> function call and locks cost are the dominating factor).
>>>
>>> The ChaCha20 implementation is based on the RFC8439 [1], with last
>>> step that XOR with the input omited.  Since the input stream will either
>>> zero bytes (initial state) or the PRNG output itself this step does not
>>> add any extra entropy.
>>
>>
>> This can also state the implementation is following OpenBSD arc4random current implementation.
>>
>>
> 
> Agree, it is worth to add it.
> 
>>> +
>>> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
>>> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
>>> +   number of bytes until a reseed.
>>> +
>>> +   Both the initial seed an reseed tries to obtain entropy from the kernel
>>
>> an ->  and
> 
> Ack.
> 
>>> +
>>> +static void
>>> +arc4random_rekey (uint8_t *rnd, size_t rndlen)
>>> +{
>>> +  memset (state->buf, 0, sizeof state->buf);
>>
>> There's no need to clear buf as call to chacha20_crypt() will overwrite it (since it doesn't XOR with it anymore).
>>
>> See https://github.com/openbsd/src/blob/master/lib/libc/crypt/arc4random.c#L121
> 
> Ack, I have removed it.
> 
> 
>>> +
>>> +      /* Extract a bit and append it to c.  c remains less than v and
>>> +         thus 2**33.  */
>>> +      c = (c << 1) | (bits & 1);
>>> +      bits >>= 1;
>>> +      --bits_length;
>>> +
>>> +      /* At this point, c is uniformly distributed on [0, v) again,
>>> +         and v < 2n < 2**33.  */
>>> +    }
>>
>> I'm not familiar with this method.
>>
>> It's not one reviewed at https://www.pcg-random.org/posts/bounded-rands.html
>>
>> In this patch I used what's called by PCG author,the "Bitmask with Rejection (Unbiased) — Apple's Method"
>>
>> https://github.com/Parrot-Developers/libfutils/commit/9dc7243ae2f2059b4590a702be2ca9c03578067f
>>
>> I like it because it doesn't uses modulo at all :)
>>
>> But the OpenBSD's arc4random_uniform() is even more simple in term of C code.
>>
> 
> You can find the reference paper on arxiv [1].  The main advantage of this
> method is the that the unit of randomness is not the uniform random variable
> (uint32_t), but a random bit.  It optimizes the internal buffer sampling by
> initially consuming a 32-bit random variable and then sampling byte per byte.
> Depending of the upper bound requested, it might lead to better CPU utilization.
> 
> From the article:
> 
>   But unexpectedly, it turns out that the extra buffering inherent in consuming
>   randomness random-bit-by-random-bit, although time consuming, is more than
>   compensated by the increased efficiency in using random bits compared with most
>   common methods.
> 
> It is specially true if you consider that both chacha20 block generation and
> getting kernel entropy (through either getrandom or /dev/urandom) are way
> more time consuming than bit twiddling. 
> 
> [1] https://arxiv.org/pdf/1304.1916.pdf
> 

And you can see that this method does not use modulo as well.  It does use 
arc4random_buf internally, what might add some overhead. One possible
optimization could to add a internal function to consume only one byte,
but I am not sure if this really pays off.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
  2022-04-25  2:22   ` Mark Harris
@ 2022-04-25 12:26     ` Adhemerval Zanella
  0 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-25 12:26 UTC (permalink / raw)
  To: Mark Harris; +Cc: libc-alpha, Florian Weimer



On 24/04/2022 23:22, Mark Harris wrote:
> On Tue, Apr 19, 2022 at 2:29 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> The implementation is based on scalar Chacha20, with global cache and
>> locking.  It uses getrandom or /dev/urandom as fallback to get the
>> initial entropy, and reseeds the internal state on every 16MB of
>> consumed buffer.
>>
>> It maintains an internal buffer which consumes at maximum one page on
>> most systems (assuming minimum of 4k pages).  The internal buf optimizes
>> the cipher encrypt calls, by amortize arc4random calls (where both
> 
> s/amortize/amortizing/
> 

Ack.

>> function call and locks cost are the dominating factor).
> 
> s/locks/lock/

Ack.

> 
>>
>> The ChaCha20 implementation is based on the RFC8439 [1], with last
>> step that XOR with the input omited.  Since the input stream will either
>> zero bytes (initial state) or the PRNG output itself this step does not
>> add any extra entropy.
> 
> The src argument to chacha20_crypt is always zeros, never PRNG output.
> Perhaps it would be clearer to say something like this:
> 
> The ChaCha20 implementation is based on RFC8439 [1], omitting the final
> XOR of the keystream with the plaintext because the plaintext is a
> stream of zeros.

Ack, I have also added a remark it follow OpenBSD strategy:

  The ChaCha20 implementation is based on RFC8439 [1], omitting the final                       
  XOR of the keystream with the plaintext because the plaintext is a                            
  stream of zeros.  This strategy is similar to what OpenBSD arc4random                         
  does.  

>> diff --git a/include/stdlib.h b/include/stdlib.h
>> index 1c6f70b082..055f9d2965 100644
>> --- a/include/stdlib.h
>> +++ b/include/stdlib.h
>> @@ -144,6 +144,19 @@ libc_hidden_proto (__ptsname_r)
>>  libc_hidden_proto (grantpt)
>>  libc_hidden_proto (unlockpt)
>>
>> +__typeof (arc4random) __arc4random;
>> +libc_hidden_proto (__arc4random);
>> +__typeof (arc4random_buf) __arc4random_buf;
>> +libc_hidden_proto (__arc4random_buf);
>> +__typeof (arc4random_uniform) __arc4random_uniform;
>> +libc_hidden_proto (__arc4random_uniform);
>> +extern void __arc4random_buf_internal (void *buffer, size_t len)
>> +     attribute_hidden;
>> +/* Called from the fork function to reinitialize the internal lock in thte
> 
> s/thte/the/

Ack.


>> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the
>> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum
>> +   number of bytes until a reseed.
>> +
>> +   Both the initial seed an reseed tries to obtain entropy from the kernel
>> +   and abort the process if none could be obtained.
>> +
>> +   The state 'buf' improves the usage of the cipher call, allowing to call
>> +   optimized implementations (if the archictecture provides it) and optimize
>> +   arc4random calls (since only multiple call it will encrypt the next block).
>> + */
>> +
>> +/* Maximum number bytes until reseed (16 MB).  */
>> +#define CHACHE_RESEED_SIZE     (16 * 1024 * 1024)

It should, I changed it.

> 
> Should this be CHACHA20_RESEED_SIZE?
> 
>> +/* Internal buffer size in bytes (1KB).  */
>> +#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)
> 
> 8 * 64 = 512; should this be (16 * CHACHA20_BLOCK_SIZE)?

I updated the comment in fact.  I changed to 512 on v3 the optimize the
lockless optimization TCB buffer size.


>> +/* Fork detection is done by checking if MADV_WIPEONFORK supported.  If not
>> +   the fork callback will reset the state on the fork call.  It does not
>> +   handle direct clone calls, nor vfork or _Fork (arc4random is not
>> +   async-signal-safe due the internal lock usage).  */
>> +static void
>> +arc4random_init (uint8_t *buf, size_t len)
> 
> len is not used in this function.

Indeed, I removed it.

> 
>> +{
>> +  state = __mmap (NULL, sizeof (struct arc4random_state),
>> +                 PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>> +  if (state == MAP_FAILED)
>> +    arc4random_allocate_failure ();
>> +
>> +#ifdef MADV_WIPEONFORK
>> +  int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK);
>> +  if (r == 0)
>> +    __arc4random_wipeonfork = true;
>> +  else if (errno != EINVAL)
>> +    arc4random_allocate_failure ();
>> +#endif
>> +
>> +  chacha20_init (state->ctx, buf, buf + CHACHA20_KEY_SIZE);
>> +}
>> +
>> +#define min(x,y) (((x) > (y)) ? (y) : (x))
>> +
>> +static void
>> +arc4random_rekey (uint8_t *rnd, size_t rndlen)
>> +{
>> +  memset (state->buf, 0, sizeof state->buf);
>> +  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
>> +
>> +  /* Mix some extra entropy if provided.  */
>> +  if (rnd != NULL)
>> +    {
>> +      size_t m = min (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
>> +      for (size_t i = 0; i < m; i++)
>> +       state->buf[i] ^= rnd[i];
>> +    }
>> +
>> +  /* Immediately reinit for backtracking resistance.  */
>> +  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
>> +  memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
>> +  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
>> +}
>> +
>> +static void
>> +arc4random_getentropy (uint8_t *rnd, size_t len)
>> +{
>> +  if (__getrandomn_nocancel (rnd, len, GRND_NONBLOCK) == len)
>> +    return;
>> +
>> +  int fd = __open64_nocancel ("/dev/urandom", O_RDONLY);
> 
> Should this be O_RDONLY | O_CLOEXEC?

It should, I have change it.

> 
>> +  if (fd != -1)
>> +    {
>> +      unsigned char *p = rnd;
>> +      unsigned char *end = p + len;
> 
> uint8_t * would be consistent with the declaration of md.

Ack.

> 
>> +      do
>> +       {
>> +         ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
>> +         if (ret <= 0)
>> +           arc4random_getrandom_failure ();
>> +         p += ret;
>> +       }
>> +      while (p < end);
>> +
>> +      if (__close_nocancel (fd) != 0)
> 
> Should this be == 0?

Ack.

>> +static uint32_t
>> +compute_uniform (uint32_t n)
>> +{
>> +  if (n <= 1)
>> +    /* There is no valid return value for a zero limit, and 0 is the
>> +       only possible result for limit 1.  */
>> +    return 0;
>> +
>> +  /* The bits variable serves as a source for bits.  Prefetch the
>> +     minimum number of bytes needed.  */
>> +  unsigned count = byte_count (n);
> 
> uint32_t would be consistent with the declaration of byte_count.

Ack.

> 
>> +  uint32_t bits_length = count * CHAR_BIT;
>> +  uint32_t bits;
>> +  random_bytes (&bits, count);
>> +
>> +  /* Powers of two are easy.  */
>> +  if (powerof2 (n))
>> +    return bits & (n - 1);
>> +
>> +  /* The general case.  This algorithm follows Jérémie Lumbroso,
>> +     Optimal Discrete Uniform Generation from Coin Flips, and
>> +     Applications (2013), who credits Donald E. Knuth and Andrew
>> +     C. Yao, The complexity of nonuniform random number generation
>> +     (1976), for solving the general case.
>> +
>> +     The implementation below unrolls the initialization stage of the
>> +     loop, where v is less than n.  */
>> +
>> +  /* Use 64-bit variables even though the intermediate results are
>> +     never larger that 33 bits.  This ensures the code easier to
> 
> s/that/than/
> s/the code/that the code is/

Ack.

>> +
>> +/* 32-bit stream position, then 96-bit nonce.  */
>> +#define CHACHA20_IV_SIZE       16
>> +#define CHACHA20_KEY_SIZE      32
>> +
>> +#define CHACHA20_BLOCK_SIZE     64
>> +#define CHACHA20_BLOCK_WORDS    (CHACHA20_BLOCK_SIZE / sizeof (uint32_t))
>> +
>> +#define CHACHA20_STATE_LEN     16
>> +
>> +/* Defining CHACHA20_XOR_FINAL issues the final XOR using the input as defined
>> +   Sby RFC8439.  Since the input stream will either zero bytes (initial state)
> 
> s/Sby/by/

Ack.

> 
>> +   or the PRNG output itself this step does not add any extra entropy.   */
> 
> The plaintext input stream (src argument to chacha20_crypt) is always
> zeros, never PRNG output.

Indeed, I have change to the suggestion you gave above.

> 
>> +
>> +enum chacha20_constants
>> +{
>> +  CHACHA20_CONSTANT_EXPA = 0x61707865U,
>> +  CHACHA20_CONSTANT_ND_3 = 0x3320646eU,
>> +  CHACHA20_CONSTANT_2_BY = 0x79622d32U,
>> +  CHACHA20_CONSTANT_TE_K = 0x6b206574U
>> +};
>> +
>> +static inline uint32_t
>> +read_unaligned_32 (const uint8_t *p)
>> +{
>> +  uint32_t r;
>> +  memcpy (&r, p, sizeof (r));
>> +  return r;
>> +}
>> +
>> +static inline void
>> +write_unaligned_32 (uint8_t *p, uint32_t v)
>> +{
>> +  memcpy (p, &v, sizeof (v));
>> +}
>> +
>> +#if __BYTE_ORDER == __BIG_ENDIAN
>> +# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p))
>> +# define set_state(v)          __builtin_bswap32 ((v))
>> +#else
>> +# define read_unaligned_le32(p) read_unaligned_32 ((p))
>> +# define set_state(v)          (v)
>> +#endif
>> +
>> +static inline void
>> +chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv)
>> +{
>> +  state[0]  = CHACHA20_CONSTANT_EXPA;
>> +  state[1]  = CHACHA20_CONSTANT_ND_3;
>> +  state[2]  = CHACHA20_CONSTANT_2_BY;
>> +  state[3]  = CHACHA20_CONSTANT_TE_K;
>> +
>> +  state[4]  = read_unaligned_le32 (key + 0 * sizeof (uint32_t));
>> +  state[5]  = read_unaligned_le32 (key + 1 * sizeof (uint32_t));
>> +  state[6]  = read_unaligned_le32 (key + 2 * sizeof (uint32_t));
>> +  state[7]  = read_unaligned_le32 (key + 3 * sizeof (uint32_t));
>> +  state[8]  = read_unaligned_le32 (key + 4 * sizeof (uint32_t));
>> +  state[9]  = read_unaligned_le32 (key + 5 * sizeof (uint32_t));
>> +  state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t));
>> +  state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t));
>> +
>> +  state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t));
>> +  state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t));
>> +  state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t));
>> +  state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t));
>> +}
>> +
>> +static inline uint32_t
>> +rotl32 (unsigned int shift, uint32_t word)
>> +{
>> +  return (word << (shift & 31)) | (word >> ((-shift) & 31));
>> +}
>> +
>> +#define QROUND(x0, x1, x2, x3)                         \
>> +  do {                                         \
>> +   x0 = x0 + x1; x3 = rotl32 (16, (x0 ^ x3));  \
>> +   x2 = x2 + x3; x1 = rotl32 (12, (x1 ^ x2));  \
>> +   x0 = x0 + x1; x3 = rotl32 (8,  (x0 ^ x3));  \
>> +   x2 = x2 + x3; x1 = rotl32 (7,  (x1 ^ x2));  \
>> +  } while(0)
>> +
>> +static inline void
>> +chacha20_block (uint32_t *state, uint32_t *stream)
>> +{
>> +  uint32_t x[CHACHA20_STATE_LEN];
>> +  memcpy (x, state, sizeof x);
>> +
>> +  for (int i = 0; i < 20; i += 2)
>> +    {
>> +      QROUND (x[0], x[4], x[8],  x[12]);
>> +      QROUND (x[1], x[5], x[9],  x[13]);
>> +      QROUND (x[2], x[6], x[10], x[14]);
>> +      QROUND (x[3], x[7], x[11], x[15]);
>> +
>> +      QROUND (x[0], x[5], x[10], x[15]);
>> +      QROUND (x[1], x[6], x[11], x[12]);
>> +      QROUND (x[2], x[7], x[8],  x[13]);
>> +      QROUND (x[3], x[4], x[9],  x[14]);
>> +    }
>> +
>> +  /* Unroll the loop a bit.  */
>> +  for (int i = 0; i < CHACHA20_BLOCK_WORDS / 4; i++)
>> +    {
>> +      stream[i*4+0] = set_state (x[i*4+0] + state[i*4+0]);
>> +      stream[i*4+1] = set_state (x[i*4+1] + state[i*4+1]);
>> +      stream[i*4+2] = set_state (x[i*4+2] + state[i*4+2]);
>> +      stream[i*4+3] = set_state (x[i*4+3] + state[i*4+3]);
>> +    }
>> +
>> +  state[12]++;
>> +}
>> +
>> +static void
>> +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src,
>> +               size_t bytes)
>> +{
>> +  uint32_t stream[CHACHA20_BLOCK_WORDS];
>> +
>> +  while (bytes >= CHACHA20_BLOCK_SIZE)
>> +    {
>> +      chacha20_block (state, stream);
>> +#ifdef CHACHA20_XOR_FINAL
>> +      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
>> +       stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
>> +#endif
>> +      memcpy (dst, stream, CHACHA20_BLOCK_SIZE);
>> +      bytes -= CHACHA20_BLOCK_SIZE;
>> +      dst += CHACHA20_BLOCK_SIZE;
>> +      src += CHACHA20_BLOCK_SIZE;
>> +    }
>> +  if (bytes != 0)
>> +    {
>> +      chacha20_block (state, stream);
>> +#ifdef CHACHA20_XOR_FINAL
>> +      for (int i = 0; i < CHACHA20_BLOCK_WORDS; i++)
>> +       stream[i] ^= read_unaligned_32 (&src[i * sizeof (uint32_t)]);
>> +#endif
>> +      memcpy (dst, stream, bytes);
>> +    }
>> +}
>> diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
>> index bf7cd438e1..f2b0c83c12 100644
>> --- a/stdlib/stdlib.h
>> +++ b/stdlib/stdlib.h
>> @@ -485,6 +485,7 @@ extern unsigned short int *seed48 (unsigned short int __seed16v[3])
>>  extern void lcong48 (unsigned short int __param[7]) __THROW __nonnull ((1));
>>
>>  # ifdef __USE_MISC
>> +#  include <bits/stdint-uintn.h>
>>  /* Data structure for communication with thread safe versions.  This
>>     type is to be regarded as opaque.  It's only exported because users
>>     have to allocate objects of this type.  */
>> @@ -533,6 +534,19 @@ extern int seed48_r (unsigned short int __seed16v[3],
>>  extern int lcong48_r (unsigned short int __param[7],
>>                       struct drand48_data *__buffer)
>>       __THROW __nonnull ((1, 2));
>> +
>> +/* Return a random integer between zero and 2**31-1 (inclusive).  */
> 
> 2**32-1
> 

Ack.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random
  2022-04-22 16:02   ` Yann Droneaud
@ 2022-04-25 12:36     ` Adhemerval Zanella
  0 siblings, 0 replies; 22+ messages in thread
From: Adhemerval Zanella @ 2022-04-25 12:36 UTC (permalink / raw)
  To: Yann Droneaud, libc-alpha



On 22/04/2022 13:02, Yann Droneaud wrote:
> Le 19/04/2022 à 23:28, Adhemerval Zanella via Libc-alpha a écrit :
>> The arc4random state is moved to TCB, so there is no allocation
>> failure.  It adds about 592 bytes struct pthread.
> 
> +to struct pthread ?

Ack.


>> +/* Reinit the thread context by reseeding the cipher state with kernel
>> +   entropy.  */
>> +static struct arc4random_state *
>> +arc4random_check_stir (size_t len)
>>  {
>> -  uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
>> -  arc4random_getentropy (rnd, sizeof rnd);
>> +  struct arc4random_state *state = &__glibc_tls_internal()->rnd_state;
>>  
>> -  if (state == NULL)
>> -    arc4random_init (rnd, sizeof rnd);
>> -  else
>> -    arc4random_rekey (rnd, sizeof rnd);
>> +  if (state->count < len || state->count == -1)
>> +    {
>> +      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
>> +      arc4random_getentropy (rnd, sizeof rnd);
>>  
>> -  explicit_bzero (rnd, sizeof rnd);
>> +      if (state->count > CHACHE_RESEED_SIZE)
>> +	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
> 
> for case state->count == -1, chacha20_init() should be called (first) instead of arc4random_rekey()
> as chacha20 context is not setup and the buffer contains no keystream yet 
> 
>     if (state->count == -1)
>         chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
> 
> 

Indeed, I forgot to change it. 


>>  static inline struct tls_internal_t *
>>  __glibc_tls_internal (void)
>>  {
>> @@ -31,8 +44,18 @@ __glibc_tls_internal (void)
>>  static inline void
>>  __glibc_tls_internal_free (void)
>>  {
>> -  free (THREAD_SELF->tls_state.strsignal_buf);
>> -  free (THREAD_SELF->tls_state.strerror_l_buf);
>> +  struct pthread *self = THREAD_SELF;
>> +  free (self->tls_state.strsignal_buf);
>> +  free (self->tls_state.strerror_l_buf);
>> +  if (self->tls_state.rnd_state.count != -1)
>> +    {
>> +      /* Clear any lingering random state prior so if the thread stack
>> +	 is cached it won't leak any data.  */
>> +      memset (&self->tls_state.rnd_state, 0,
>> +	      sizeof self->tls_state.rnd_state);
>> +      /* Force key init on created threads.  */
>> +      self->tls_state.rnd_state.count = -1;
> 
> setting to -1 is probably not needed, as it will be set by the init function.

Indeed, I removed it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-04-25 12:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 21:28 [PATCH v3 0/9] Add arc4random support Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-19 21:52   ` H.J. Lu
2022-04-20 12:38     ` Adhemerval Zanella
2022-04-22 13:54   ` Yann Droneaud
2022-04-25 12:15     ` Adhemerval Zanella
2022-04-25 12:20       ` Adhemerval Zanella
2022-04-25  2:22   ` Mark Harris
2022-04-25 12:26     ` Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 2/9] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 3/9] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 4/9] aarch64: Add optimized chacha20 Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 5/9] x86: Add SSE2 " Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 6/9] x86: Add AVX2 " Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 7/9] powerpc64: Add " Adhemerval Zanella
2022-04-20 18:38   ` Paul E Murphy
2022-04-20 19:23     ` Adhemerval Zanella
2022-04-22 21:09       ` Paul E Murphy
2022-04-19 21:28 ` [PATCH v3 8/9] s390x: " Adhemerval Zanella
2022-04-19 21:28 ` [PATCH v3 9/9] stdlib: Add TLS optimization to arc4random Adhemerval Zanella
2022-04-22 16:02   ` Yann Droneaud
2022-04-25 12:36     ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).