From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id 4B81138582A1 for ; Thu, 28 Jul 2022 14:58:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4B81138582A1 Received: by mail-oi1-x22e.google.com with SMTP id c185so2680988oia.7 for ; Thu, 28 Jul 2022 07:58:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:organization:in-reply-to :content-transfer-encoding; bh=MEOzOPZE65Lh97Owc8MAnYYawo2Tonp7sbC90oyAqRE=; b=SnYNwqgUDeOssDrs/4ezLMaikkqvgX/Y7RKGlcqmTZebHvxNcaN9ewtTcRLiluEDXQ 9Up/gGUCemxjDInN3cLgMxa6gCeuX2tv8y8iFW58lslwiEkFDP+SHgKOVkVhEekkBLqO yijid2eSB0YP++dMLxJKYL1bxL+Ati4+WlmQrOqQTNSGRFMJYx2HdAOIIUskx8RAfUjE B0dj5H9V+L4xOPaGSKhECpuu21zpUONDVOGubw2W+L7tWkLx2seMMicpoqYiFIRmnblX 9AhKkTBTYxx0wgtrfu8bk9a7j9lZRjvPBd02lTvHGreq4ZDPH3LrJSrUF7wFQtjfmf/y V0vw== X-Gm-Message-State: AJIora/u5BchTywpRYDlBxL9bMlr1OqXUsGvOc0tiG+mqi7tuxl0OBxu 32FFKdBdTMMGqSy8cixzorQicw== X-Google-Smtp-Source: AGRyM1tK426L3PwEuxTO3EpamC4IZb1NWnTVqsrbEdoCQk5frtd9UTYc9MAEQX9JrvZGupH1G5RXTQ== X-Received: by 2002:a05:6808:120f:b0:325:cef0:3f7 with SMTP id a15-20020a056808120f00b00325cef003f7mr4186031oil.38.1659020321552; Thu, 28 Jul 2022 07:58:41 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:1e34:8c1:7b8a:41d4:61c7? ([2804:431:c7cb:1e34:8c1:7b8a:41d4:61c7]) by smtp.gmail.com with ESMTPSA id o23-20020a4ae597000000b00435a68593ebsm425121oov.27.2022.07.28.07.58.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 28 Jul 2022 07:58:41 -0700 (PDT) Message-ID: <0ec012c4-6240-e8a7-e865-147516c0e64c@linaro.org> Date: Thu, 28 Jul 2022 11:58:36 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.0.3 Subject: Re: [PATCH] stdlib: Simplify arc4random_uniform Content-Language: en-US To: Yann Droneaud , libc-alpha@sourceware.org, Florian Weimer , Carlos O'Donell References: <20220728124528.39169-1-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2022 14:58:44 -0000 On 28/07/22 11:46, Yann Droneaud wrote: > Hi, > > Le 28/07/2022 à 14:45, Adhemerval Zanella a écrit : >> It uses the bitmask with rejection [1], which calculates a mask >> being the lowest power of two bounding the request upper bound, >> successively queries new random values, and rejects values >> outside the requested range. >> >> Performance-wise, there is no much gain in trying to converse >> bits since arc4random is wrapper on getrandom syscall.  It should >> be cheaper to just query a uint32_t value.  The algorithm also >> avoids mudulo and divide operations, which might be costly >> depending of the architecture. >> >> [1] https://www.pcg-random.org/posts/bounded-rands.html >> --- >>   stdlib/arc4random_uniform.c | 131 +++++++++--------------------------- >>   1 file changed, 32 insertions(+), 99 deletions(-) >> >> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c >> index 1326dfa593..425282cd15 100644 >> --- a/stdlib/arc4random_uniform.c >> +++ b/stdlib/arc4random_uniform.c >> @@ -17,38 +17,19 @@ >>      License along with the GNU C Library; if not, see >>      .  */ >>   -#include >> -#include >>   #include >>   #include >>   -/* Return the number of bytes which cover values up to the limit.  */ >> -__attribute__ ((const)) >> -static uint32_t >> -byte_count (uint32_t n) >> -{ >> -  if (n < (1U << 8)) >> -    return 1; >> -  else if (n < (1U << 16)) >> -    return 2; >> -  else if (n < (1U << 24)) >> -    return 3; >> -  else >> -    return 4; >> -} >> +/* Return a uniformly distributed random number less than N.  The algorithm >> +   calculates a mask being the lowest power of two bounding the upper bound >> +   N, successively queries new random values, and rejects values outside of >> +   the request range. >>   -/* Fill the lower bits of the result with randomness, according to the >> -   number of bytes requested.  */ >> -static void >> -random_bytes (uint32_t *result, uint32_t byte_count) >> -{ >> -  *result = 0; >> -  unsigned char *ptr = (unsigned char *) result; >> -  if (__BYTE_ORDER == __BIG_ENDIAN) >> -    ptr += 4 - byte_count; >> -  __arc4random_buf (ptr, byte_count); >> -} >> +   For reject values, it also tries if the remaining entropy could fit on >> +   the asked range after range adjustment. >>   +   The algorithm avoids modulo and divide operations, which might be costly >> +   depending on the architecture.  */ >>   uint32_t >>   __arc4random_uniform (uint32_t n) >>   { >> @@ -57,83 +38,35 @@ __arc4random_uniform (uint32_t n) >>          only possible result for limit 1.  */ >>       return 0; >>   -  /* The bits variable serves as a source for bits.  Prefetch the >> -     minimum number of bytes needed.  */ >> -  uint32_t count = byte_count (n); >> -  uint32_t bits_length = count * CHAR_BIT; >> -  uint32_t bits; >> -  random_bytes (&bits, count); >> - >>     /* Powers of two are easy.  */ >>     if (powerof2 (n)) >> -    return bits & (n - 1); >> - >> -  /* The general case.  This algorithm follows Jérémie Lumbroso, >> -     Optimal Discrete Uniform Generation from Coin Flips, and >> -     Applications (2013), who credits Donald E. Knuth and Andrew >> -     C. Yao, The complexity of nonuniform random number generation >> -     (1976), for solving the general case. >> +    return __arc4random () & (n - 1); >>   -     The implementation below unrolls the initialization stage of the >> -     loop, where v is less than n.  */ >> +  /* mask is the smallest power of 2 minus 1 number larger than n.  */ >> +  int z = __builtin_clz (n); >> +  uint32_t mask = ~UINT32_C(0) >> z; >> +  int bits = CHAR_BIT * sizeof (uint32_t) - z; >>   -  /* Use 64-bit variables even though the intermediate results are >> -     never larger than 33 bits.  This ensures the code is easier to >> -     compile on 64-bit architectures.  */ >> -  uint64_t v; >> -  uint64_t c; >> - >> -  /* Initialize v and c.  v is the smallest power of 2 which is larger >> -     than n.*/ >> -  { >> -    uint32_t log2p1 = 32 - __builtin_clz (n); >> -    v = 1ULL << log2p1; >> -    c = bits & (v - 1); >> -    bits >>= log2p1; >> -    bits_length -= log2p1; >> -  } >> - >> -  /* At the start of the loop, c is uniformly distributed within the >> -     half-open interval [0, v), and v < 2n < 2**33.  */ >> -  while (true) >> +  while (1) >>       { >> -      if (v >= n) >> -        { >> -          /* If the candidate is less than n, accept it.  */ >> -          if (c < n) >> -            /* c is uniformly distributed on [0, n).  */ >> -            return c; >> -          else >> -            { >> -              /* c is uniformly distributed on [n, v).  */ >> -              v -= n; >> -              c -= n; >> -              /* The distribution was shifted, so c is uniformly >> -                 distributed on [0, v) again.  */ >> -            } >> -        } >> -      /* v < n here.  */ >> - >> -      /* Replenish the bit source if necessary.  */ >> -      if (bits_length == 0) >> -        { >> -          /* Overwrite the least significant byte.  */ >> -      random_bytes (&bits, 1); >> -      bits_length = CHAR_BIT; >> -        } >> - >> -      /* Double the range.  No overflow because v < n < 2**32.  */ >> -      v *= 2; >> -      /* v < 2n here.  */ >> - >> -      /* Extract a bit and append it to c.  c remains less than v and >> -         thus 2**33.  */ >> -      c = (c << 1) | (bits & 1); >> -      bits >>= 1; >> -      --bits_length; >> - >> -      /* At this point, c is uniformly distributed on [0, v) again, >> -         and v < 2n < 2**33.  */ >> +      uint32_t value = __arc4random (); >> + >> +      /* Return if the lower power of 2 minus 1 satisfy the condition.  */ >> +      uint32_t r = value & mask; >> +      if (r < n) >> +    return r; >> + >> +      /* Otherwise check if remaining bits of entropy provides fits in the >> +     bound.  */ >> +      int bits_left = z; >> +      while (bits_left >= bits) >> +    { >> +      value >>= bits; >> +      r = value & mask; >> +      if (r < n) >> +        return r; >> +      bits_left -= bits; >> +    } >>       } >>   } >>   libc_hidden_def (__arc4random_uniform) > > > It's the algorithm I suggestedin <9b6f8e87-9226-7828-3569-cff0e3575f9a@opteya.com> > https://sourceware.org/pipermail/libc-alpha/2022-April/138070.html Indeed I recall you pointed out this method, with current arc4random being a getrandom wrapper it does make more sense to use a simpler solution. > > > LGTM. > > > Reviewed-by: Yann Droneaud > Thanks.