From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adhemerval.zanella@linaro.org>
Received: from mail-oa1-x36.google.com (mail-oa1-x36.google.com
 [IPv6:2001:4860:4864:20::36])
 by sourceware.org (Postfix) with ESMTPS id 0F65738582BF
 for <libc-alpha@sourceware.org>; Mon,  1 Aug 2022 17:45:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F65738582BF
Received: by mail-oa1-x36.google.com with SMTP id
 586e51a60fabf-10ee900cce0so2573091fac.5
 for <libc-alpha@sourceware.org>; Mon, 01 Aug 2022 10:45:36 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:from:to:references:organization:in-reply-to
 :content-transfer-encoding;
 bh=NBKN8GyNXUAIefjuO0piiLBERfje2loORbW71RHQ0OY=;
 b=5r8d+KRTuL3L+D/6+cygt1oN+RfK8v6ob/WGlk/SdXDI8Z1lZfMCKA08A5OP3mB7P8
 M/D7eDKdzzFP/ZV/IzAk/W8pTV3TesShr6pa6TJLAwChxK18TOBFmBK6ot2liNI1oHd9
 Sv4w6VldK5mvMC+8XRsC4Ktc/f8YT25UuRUSFPZK5CfQqozWhgvVrGAYBgsFWlICXnGV
 vS8tfrVHLJxRg5ZM/S5esJpTDDns0wC7okPp2GQkBKiRBfWugqBmPtwY4AAdASHBu/Lz
 xXjJiXK0Cxm6nRcrY7qNpAk3NosP/XtZpoZt5Xe/Hz+cGERirmD8Sjaud8aANWfewO61
 xxAA==
X-Gm-Message-State: ACgBeo0x+AAlct7w8DB7FSa7m+KnCxyaYiFMZqHKbYNL8Ai4Zv8wTfvj
 PArVvUcyW6QZmStY/5WU+Vxg/R8I3Onu1Q==
X-Google-Smtp-Source: AA6agR7UdCWBAoo5wD4WWfw+Othi4XqbCiRksVoq2LC9xqTpNUjxlQsO+0aq8xbT2JBaBWz+SXM3pg==
X-Received: by 2002:a05:6870:1712:b0:10e:e9e1:8a80 with SMTP id
 h18-20020a056870171200b0010ee9e18a80mr1496175oae.247.1659375934900; 
 Mon, 01 Aug 2022 10:45:34 -0700 (PDT)
Received: from ?IPV6:2804:431:c7cb:1e34:d541:f280:24d8:b04c?
 ([2804:431:c7cb:1e34:d541:f280:24d8:b04c])
 by smtp.gmail.com with ESMTPSA id
 13-20020aca100d000000b00339eb701c6csm2529512oiq.41.2022.08.01.10.45.33
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Mon, 01 Aug 2022 10:45:34 -0700 (PDT)
Message-ID: <bead926d-72ce-5356-f1b1-6b0936752eb4@linaro.org>
Date: Mon, 1 Aug 2022 14:45:31 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.1.0
Subject: Re: [PATCH v2] stdlib: Simplify arc4random_uniform
Content-Language: en-US
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org, Florian Weimer <fweimer@redhat.com>,
 Carlos O'Donell <carlos@redhat.com>, Yann Droneaud <ydroneaud@opteya.com>,
 Paul Eggert <eggert@cs.ucla.edu>
References: <20220729123211.876374-1-adhemerval.zanella@linaro.org>
Organization: Linaro
In-Reply-To: <20220729123211.876374-1-adhemerval.zanella@linaro.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Aug 2022 17:45:38 -0000

Based on previous comments I will commit this shortly.

On 29/07/22 09:32, Adhemerval Zanella wrote:
> It uses the bitmask with rejection [1], which calculates a mask
> being the lowest power of two bounding the request upper bound,
> successively queries new random values, and rejects values
> outside the requested range.
> 
> Performance-wise, there is no much gain in trying to conserve
> bits since arc4random is wrapper on getrandom syscall.  It should
> be cheaper to just query a uint32_t value.  The algorithm also
> avoids modulo and divide operations, which might be costly
> depending of the architecture.
> 
> [1] https://www.pcg-random.org/posts/bounded-rands.html
> 
> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
> ---
> v2: Fixed typos in commit message and simplify loop.
> ---
>  stdlib/arc4random_uniform.c | 129 +++++++++---------------------------
>  1 file changed, 30 insertions(+), 99 deletions(-)
> 
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> index 1326dfa593..5aa98d1c13 100644
> --- a/stdlib/arc4random_uniform.c
> +++ b/stdlib/arc4random_uniform.c
> @@ -17,38 +17,19 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <endian.h>
> -#include <libc-lock.h>
>  #include <stdlib.h>
>  #include <sys/param.h>
>  
> -/* Return the number of bytes which cover values up to the limit.  */
> -__attribute__ ((const))
> -static uint32_t
> -byte_count (uint32_t n)
> -{
> -  if (n < (1U << 8))
> -    return 1;
> -  else if (n < (1U << 16))
> -    return 2;
> -  else if (n < (1U << 24))
> -    return 3;
> -  else
> -    return 4;
> -}
> +/* Return a uniformly distributed random number less than N.  The algorithm
> +   calculates a mask being the lowest power of two bounding the upper bound
> +   N, successively queries new random values, and rejects values outside of
> +   the request range.
>  
> -/* Fill the lower bits of the result with randomness, according to the
> -   number of bytes requested.  */
> -static void
> -random_bytes (uint32_t *result, uint32_t byte_count)
> -{
> -  *result = 0;
> -  unsigned char *ptr = (unsigned char *) result;
> -  if (__BYTE_ORDER == __BIG_ENDIAN)
> -    ptr += 4 - byte_count;
> -  __arc4random_buf (ptr, byte_count);
> -}
> +   For reject values, it also tries if the remaining entropy could fit on
> +   the asked range after range adjustment.
>  
> +   The algorithm avoids modulo and divide operations, which might be costly
> +   depending on the architecture.  */
>  uint32_t
>  __arc4random_uniform (uint32_t n)
>  {
> @@ -57,83 +38,33 @@ __arc4random_uniform (uint32_t n)
>         only possible result for limit 1.  */
>      return 0;
>  
> -  /* The bits variable serves as a source for bits.  Prefetch the
> -     minimum number of bytes needed.  */
> -  uint32_t count = byte_count (n);
> -  uint32_t bits_length = count * CHAR_BIT;
> -  uint32_t bits;
> -  random_bytes (&bits, count);
> -
>    /* Powers of two are easy.  */
>    if (powerof2 (n))
> -    return bits & (n - 1);
> -
> -  /* The general case.  This algorithm follows Jérémie Lumbroso,
> -     Optimal Discrete Uniform Generation from Coin Flips, and
> -     Applications (2013), who credits Donald E. Knuth and Andrew
> -     C. Yao, The complexity of nonuniform random number generation
> -     (1976), for solving the general case.
> +    return __arc4random () & (n - 1);
>  
> -     The implementation below unrolls the initialization stage of the
> -     loop, where v is less than n.  */
> +  /* mask is the smallest power of 2 minus 1 number larger than n.  */
> +  int z = __builtin_clz (n);
> +  uint32_t mask = ~UINT32_C(0) >> z;
> +  int bits = CHAR_BIT * sizeof (uint32_t) - z;
>  
> -  /* Use 64-bit variables even though the intermediate results are
> -     never larger than 33 bits.  This ensures the code is easier to
> -     compile on 64-bit architectures.  */
> -  uint64_t v;
> -  uint64_t c;
> -
> -  /* Initialize v and c.  v is the smallest power of 2 which is larger
> -     than n.*/
> -  {
> -    uint32_t log2p1 = 32 - __builtin_clz (n);
> -    v = 1ULL << log2p1;
> -    c = bits & (v - 1);
> -    bits >>= log2p1;
> -    bits_length -= log2p1;
> -  }
> -
> -  /* At the start of the loop, c is uniformly distributed within the
> -     half-open interval [0, v), and v < 2n < 2**33.  */
> -  while (true)
> +  while (1)
>      {
> -      if (v >= n)
> -        {
> -          /* If the candidate is less than n, accept it.  */
> -          if (c < n)
> -            /* c is uniformly distributed on [0, n).  */
> -            return c;
> -          else
> -            {
> -              /* c is uniformly distributed on [n, v).  */
> -              v -= n;
> -              c -= n;
> -              /* The distribution was shifted, so c is uniformly
> -                 distributed on [0, v) again.  */
> -            }
> -        }
> -      /* v < n here.  */
> -
> -      /* Replenish the bit source if necessary.  */
> -      if (bits_length == 0)
> -        {
> -          /* Overwrite the least significant byte.  */
> -	  random_bytes (&bits, 1);
> -	  bits_length = CHAR_BIT;
> -        }
> -
> -      /* Double the range.  No overflow because v < n < 2**32.  */
> -      v *= 2;
> -      /* v < 2n here.  */
> -
> -      /* Extract a bit and append it to c.  c remains less than v and
> -         thus 2**33.  */
> -      c = (c << 1) | (bits & 1);
> -      bits >>= 1;
> -      --bits_length;
> -
> -      /* At this point, c is uniformly distributed on [0, v) again,
> -         and v < 2n < 2**33.  */
> +      uint32_t value = __arc4random ();
> +
> +      /* Return if the lower power of 2 minus 1 satisfy the condition.  */
> +      uint32_t r = value & mask;
> +      if (r < n)
> +	return r;
> +
> +      /* Otherwise check if remaining bits of entropy provides fits in the
> +	 bound.  */
> +      for (int bits_left = z; bits_left >= bits; bits_left -= bits)
> +	{
> +	  value >>= bits;
> +	  r = value & mask;
> +	  if (r < n)
> +	    return r;
> +	}
>      }
>  }
>  libc_hidden_def (__arc4random_uniform)