Re: [PATCH v8 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Florian Weimer <fweimer@redhat.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org,  Yann Droneaud <ydroneaud@opteya.com>
Subject: Re: [PATCH v8 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
Date: Tue, 12 Jul 2022 10:57:18 +0200	[thread overview]
Message-ID: <871quqvhch.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <20220629213428.3065430-2-adhemerval.zanella@linaro.org> (Adhemerval Zanella's message of "Wed, 29 Jun 2022 18:34:20 -0300")

* Adhemerval Zanella:

> diff --git a/NEWS b/NEWS
> index b0a3d7e512..f9dc316ead 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -46,6 +46,10 @@ Major new features:
>    to more flexibly configure and operate on filesystem mounts.  The new
>    mount APIs are specifically designed to work with namespaces.
>  
> +* The functions arc4random, arc4random_buf, and arc4random_uniform have been
> +  added.  The functions use a cryptographic pseudo-random number generator
> +  based on ChaCha20 initilized with entropy from the kernel.

This implies that the generator is cryptographically strong.  Maybe it
should not mention ChaCha20 and “cryptographic”.

> diff --git a/include/stdlib.h b/include/stdlib.h
> index 1c6f70b082..c5f5628f22 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h

> +/* Called from the fork function to reinitialize the internal lock in the
> +   child process.  This avoids deadlocks if fork is called in multi-threaded
> +   processes.  */
> +extern void __arc4random_fork_subprocess (void) attribute_hidden;

Looks like the comment is outdated.  There isn't a lock anymore.

> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 01a282f3f6..ada65d40c2 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c

> @@ -559,6 +560,8 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  #endif
>    pd->robust_head.list = &pd->robust_head;
>  
> +  __glibc_tls_internal_init (&pd->tls_state);

I think this is a bit confusing because it's not aligned with the main
thread.  memset would probably be clearer because it mirrors setup for
the main thread (zeroed implicitly).

> diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c
> new file mode 100644
> index 0000000000..92d1da92cc
> --- /dev/null
> +++ b/stdlib/arc4random.c
> @@ -0,0 +1,207 @@
> +/* Pseudo Random Number Generator based on ChaCha20.
> +   Copyright (C) 2020 Free Software Foundation, Inc.

Copyright year 2022.  Please check the other files, too.

> +/* arc4random keeps two counters: 'have' is the current valid bytes not yet
> +   consumed in 'buf' while 'count' is the maximum number of bytes until a
> +   reseed.
> +
> +   Both the initial seed and reseed try to obtain entropy from the kernel
> +   and abort the process if none could be obtained.
> +
> +   The state 'buf' improves the usage of the cipher calls, allowing to call
> +   optimized implementations (if the architecture provides it) and optimize
> +   arc4random calls (since only multiple calls it will encrypt the next
> +   block).  */

I don't understand the “since only multiple calls it will encrypt the
next block” part.

> +/* Called from the fork function to reset the state.  */
> +void
> +__arc4random_fork_subprocess (void)
> +{
> +  struct arc4random_state_t *state = __glibc_tls_internal()->rand_state;
> +  if (state != NULL)
> +    {
> +      explicit_bzero (state, sizeof (struct arc4random_state_t));

sizeof (*state)?

> +      /* Force key init.  */
> +      state->count = -1;
> +    }
> +}
> +
> +/* Return the current thread random state or try to create one if there is
> +   none available.  In the case malloc can not allocate a state, arc4random
> +   will try to get entropy with arc4random_getentropy.  */
> +static struct arc4random_state_t *
> +arc4random_get_state (void)
> +{
> +  struct arc4random_state_t *state = __glibc_tls_internal()->rand_state;

Missing space before ().

> +  if (state == NULL)
> +    {
> +      state = malloc (sizeof (struct arc4random_state_t));
> +      if (state != NULL)
> +	{
> +	  /* Force key initialization on first call.  */
> +	  state->count = -1;
> +	  __glibc_tls_internal()->rand_state = state;

Missing space before ().

> +	}
> +    }
> +  return state;
> +}
> +
> +static void
> +arc4random_getrandom_failure (void)
> +{
> +  __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
> +}
> +
> +static void
> +arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen)
> +{
> +  chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf);
> +
> +  /* Mix some extra entropy if provided.  */
> +  if (rnd != NULL)
> +    {
> +      size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +      for (size_t i = 0; i < m; i++)
> +	state->buf[i] ^= rnd[i];
> +    }
> +
> +  /* Immediately reinit for backtracking resistance.  */
> +  chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE);
> +  memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +  state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE);
> +}

Should that memset be explicit_bzero for clarity?

> +static void
> +arc4random_getentropy (void *rnd, size_t len)
> +{
> +  if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len)
> +    return;
> +
> +  int fd = __open64_nocancel ("/dev/urandom", O_RDONLY | O_CLOEXEC);

This could use TEMP_FAILURE_RETRY, too.

> +  if (fd != -1)
> +    {
> +      uint8_t *p = rnd;
> +      uint8_t *end = p + len;
> +      do
> +	{
> +	  ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p));
> +	  if (ret <= 0)
> +	    arc4random_getrandom_failure ();
> +	  p += ret;
> +	}
> +      while (p < end);
> +
> +      if (__close_nocancel (fd) == 0)
> +	return;
> +    }
> +  arc4random_getrandom_failure ();
> +}
> +
> +/* Reinit the thread context by reseeding the cipher state with kernel
> +   entropy.  */
> +static void
> +arc4random_check_stir (struct arc4random_state_t *state, size_t len)

Could you add a comment describing the len parameter?

> +{
> +  if (state->count < len || state->count == -1)
> +    {
> +      uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE];
> +      arc4random_getentropy (rnd, sizeof rnd);
> +
> +      if (state->count == -1)
> +	chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE);
> +      else
> +	arc4random_rekey (state, rnd, sizeof rnd);
> +
> +      explicit_bzero (rnd, sizeof rnd);
> +
> +      state->have = 0;
> +      memset (state->buf, 0, sizeof state->buf);
> +      state->count = CHACHA20_RESEED_SIZE;
> +    }
> +  if (state->count <= len)
> +    state->count = 0;
> +  else
> +    state->count -= len;
> +}

It's not clear to me if we want to enter the true branch of the second
if statement if we already executed the true branch of the first if
statement.

> +void
> +__arc4random_buf (void *buffer, size_t len)
> +{
> +  struct arc4random_state_t *state = arc4random_get_state ();
> +  if (__glibc_unlikely (state == NULL))
> +    {
> +      arc4random_getentropy (buffer, len);
> +      return;
> +    }
> +
> +  arc4random_check_stir (state, len);
> +  while (len > 0)
> +    {
> +      if (state->have > 0)
> +	{
> +	  size_t m = MIN (len, state->have);
> +	  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +	  memcpy (buffer, ks, m);
> +	  memset (ks, 0, m);

Should be explicit_bzero, I think.

> +	  buffer += m;
> +	  len -= m;
> +	  state->have -= m;
> +	}
> +      if (state->have == 0)
> +	arc4random_rekey (state, NULL, 0);
> +    }
> +}
> +libc_hidden_def (__arc4random_buf)
> +weak_alias (__arc4random_buf, arc4random_buf)
> +
> +uint32_t
> +__arc4random (void)
> +{
> +  uint32_t r;
> +
> +  struct arc4random_state_t *state = arc4random_get_state ();
> +  if (__glibc_unlikely (state == NULL))
> +    {
> +      arc4random_getentropy (&r, sizeof (uint32_t));
> +      return r;
> +    }
> +
> +  arc4random_check_stir (state, sizeof (uint32_t));
> +  if (state->have < sizeof (uint32_t))
> +    arc4random_rekey (state, NULL, 0);
> +  uint8_t *ks = state->buf + sizeof (state->buf) - state->have;
> +  memcpy (&r, ks, sizeof (uint32_t));
> +  memset (ks, 0, sizeof (uint32_t));
> +  state->have -= sizeof (uint32_t);
> +
> +  return r;
> +}

Why not simply call __arc4random_buf?  If you want to retain the
optimization, turn the implementation of __arc4random_buf into an inline
function and call it here and from __arc4random_buf.

> +libc_hidden_def (__arc4random)
> +weak_alias (__arc4random, arc4random)
> diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h
> new file mode 100644
> index 0000000000..cdca639d9d
> --- /dev/null
> +++ b/stdlib/arc4random.h
> @@ -0,0 +1,45 @@

> +/* Internal arc4random buffer, used on each feedback step so offer some
> +   backtracking protection and to allow better used of vectorized
> +   chacha20 implementations.  */
> +#define CHACHA20_BUFSIZE        (8 * CHACHA20_BLOCK_SIZE)

Please add a _Static_assert that this is larger than CHACHA20_KEY_SIZE +
CHACHA20_IV_SIZE because the implementation assumes this.

> diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c
> new file mode 100644
> index 0000000000..4549fc780f
> --- /dev/null
> +++ b/stdlib/chacha20.c
> @@ -0,0 +1,187 @@

> +/* 32-bit stream position, then 96-bit nonce.  */
> +#define CHACHA20_IV_SIZE	16
> +#define CHACHA20_KEY_SIZE	32
> +
> +#define CHACHA20_STATE_LEN	16
> +
> +/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final
> +   XOR of the keystream with the plaintext because the plaintext is a
> +   stream of zeros.  */

You can also remove the byte swapping because the key is random and
discarded, so we don't care about its precise value.  You can do the
swapping in the ChaCha20 test instead, to get reproducible test vectors.

Hmm, given that this impacts the assembler implementations as well,
maybe we can do this at a later stage.

> diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h
> index bf7cd438e1..f02a713a7b 100644
> --- a/stdlib/stdlib.h
> +++ b/stdlib/stdlib.h
> @@ -485,6 +485,7 @@ extern unsigned short int *seed48 (unsigned short int __seed16v[3])
>  extern void lcong48 (unsigned short int __param[7]) __THROW __nonnull ((1));
>  
>  # ifdef __USE_MISC
> +#  include <bits/stdint-uintn.h>
>  /* Data structure for communication with thread safe versions.  This
>     type is to be regarded as opaque.  It's only exported because users
>     have to allocate objects of this type.  */

> +/* Return a random number between zero (inclusive) and the specified
> +   limit (exclusive).  */
> +extern uint32_t arc4random_uniform (uint32_t __upper_bound)
> +     __THROW __wur;
>  # endif	/* Use misc.  */
>  #endif	/* Use misc or X/Open.  */

I'm a bit worried about that <bits/stdint-uintn.h> inclusion.  It will
be a bit confusing if these types become available via <stdlib.h> in
some glibc versions.  Maybe use __uint32_t instead?

> diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h
> index d76c715a96..81a71ac54b 100644
> --- a/sysdeps/generic/tls-internal-struct.h
> +++ b/sysdeps/generic/tls-internal-struct.h
> @@ -19,10 +19,13 @@
>  #ifndef _TLS_INTERNAL_STRUCT_H
>  #define _TLS_INTERNAL_STRUCT_H 1
>  
> +#include <stdlib/arc4random.h>
> +

You can use a forward declaration instead.  Minimizing <descr.h>
dependencies seems desirable to me.

>  struct tls_internal_t
>  {
>    char *strsignal_buf;
>    char *strerror_l_buf;
> +  struct arc4random_state_t *rand_state;
>  };


> diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c
> index 898c20b61c..ec0ceeebd1 100644
> --- a/sysdeps/generic/tls-internal.c
> +++ b/sysdeps/generic/tls-internal.c
> @@ -16,6 +16,23 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> +#include <string.h>
>  #include <tls-internal.h>
>  
>  __thread struct tls_internal_t __tls_internal;
> +
> +void
> +__glibc_tls_internal_free (void)
> +{
> +  free (__tls_internal.strsignal_buf);
> +  free (__tls_internal.strerror_l_buf);
> +
> +  if (__tls_internal.rand_state != NULL)
> +    {
> +      /* Clear any lingering random state prior so if the thread stack is
> +	 cached it won't leak any data.  */
> +      explicit_bzero (__tls_internal.rand_state,
> +		      sizeof (struct arc4random_state_t));
> +      free (__tls_internal.rand_state);
> +    }

sizeof (*__tls_internal.rand_state)?

> diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c
> index e60b86fab1..1c44b39c5b 100644
> --- a/sysdeps/mach/hurd/_Fork.c
> +++ b/sysdeps/mach/hurd/_Fork.c
> @@ -665,6 +665,8 @@ retry:
>        /* Run things that want to run in the child task to set up.  */
>        RUN_HOOK (_hurd_fork_child_hook, ());
>  
> +      call_function_static_weak (__arc4random_fork_subprocess);
> +

Hmm, is the ordering correct?  Can _hurd_fork_child_hook run user code?
(This probably needs review from Hurd developers.)

> diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c
> index 6e25b021ab..045176197e 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.c
> +++ b/sysdeps/unix/sysv/linux/tls-internal.c
> @@ -1 +1,37 @@

> +void
> +__glibc_tls_internal_free (void)
> +{
> +  struct pthread *self = THREAD_SELF;
> +  free (self->tls_state.strsignal_buf);
> +  free (self->tls_state.strerror_l_buf);
> +
> +  if (self->tls_state.rand_state != NULL)
> +    {
> +      /* Clear any lingering random state prior so if the thread stack is
> +         cached it won't leak any data.  */
> +      explicit_bzero (self->tls_state.rand_state,
> +		      sizeof (struct arc4random_state_t));

sizeof (self->tls_state.rand_state)?

> diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h
> index f7a1a62135..f268a2d43b 100644
> --- a/sysdeps/unix/sysv/linux/tls-internal.h
> +++ b/sysdeps/unix/sysv/linux/tls-internal.h
> @@ -22,17 +22,21 @@
>  #include <stdlib.h>
>  #include <pthreadP.h>
>  
> +static inline void
> +__glibc_tls_internal_init (struct tls_internal_t *tls_state)
> +{
> +  tls_state->strsignal_buf = NULL;
> +  tls_state->strerror_l_buf = NULL;
> +  tls_state->rand_state = NULL;
> +}

See the comment about using memset earlier.

Thanks,
Florian

next prev parent reply	other threads:[~2022-07-12  8:57 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 21:34 [PATCH v8 0/9] Add arc4random support Adhemerval Zanella
2022-06-29 21:34 ` [PATCH v8 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-07-12  8:57   ` Florian Weimer [this message]
2022-07-12 16:57     ` Adhemerval Zanella Netto
2022-07-12 17:15       ` Florian Weimer
2022-07-12 17:21         ` Adhemerval Zanella Netto
2022-06-29 21:34 ` [PATCH v8 2/9] stdlib: Add arc4random tests Adhemerval Zanella
2022-07-12  9:20   ` Florian Weimer
2022-07-12 17:17     ` Adhemerval Zanella Netto
2022-06-29 21:34 ` [PATCH v8 3/9] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-07-12  9:29   ` Florian Weimer
2022-07-12 17:26     ` Adhemerval Zanella Netto
2022-06-29 21:34 ` [PATCH v8 4/9] aarch64: Add optimized chacha20 Adhemerval Zanella
2022-07-12  9:30   ` Florian Weimer
2022-06-29 21:34 ` [PATCH v8 5/9] x86: Add SSE2 " Adhemerval Zanella
2022-06-29 21:34 ` [PATCH v8 6/9] x86: Add AVX2 " Adhemerval Zanella
2022-06-29 21:34 ` [PATCH v8 7/9] powerpc64: Add " Adhemerval Zanella
2022-06-29 21:34 ` [PATCH v8 8/9] s390x: " Adhemerval Zanella
2022-06-29 21:34 ` [PATCH v8 9/9] manual: Add documentation for arc4random functions Adhemerval Zanella
2022-06-29 21:45   ` Noah Goldstein
2022-06-29 21:53     ` Adhemerval Zanella
2022-06-29 22:05       ` Noah Goldstein
2022-06-30  7:58         ` Yann Droneaud
2022-06-30 19:37         ` Adhemerval Zanella
2022-06-29 21:55   ` Adhemerval Zanella
2022-07-12  9:31 ` [PATCH v8 0/9] Add arc4random support Florian Weimer
2022-07-12 17:36   ` Adhemerval Zanella Netto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871quqvhch.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    --cc=ydroneaud@opteya.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).