From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x36.google.com (mail-oa1-x36.google.com [IPv6:2001:4860:4864:20::36]) by sourceware.org (Postfix) with ESMTPS id 0F65738582BF for ; Mon, 1 Aug 2022 17:45:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F65738582BF Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-10ee900cce0so2573091fac.5 for ; Mon, 01 Aug 2022 10:45:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:references:organization:in-reply-to :content-transfer-encoding; bh=NBKN8GyNXUAIefjuO0piiLBERfje2loORbW71RHQ0OY=; b=5r8d+KRTuL3L+D/6+cygt1oN+RfK8v6ob/WGlk/SdXDI8Z1lZfMCKA08A5OP3mB7P8 M/D7eDKdzzFP/ZV/IzAk/W8pTV3TesShr6pa6TJLAwChxK18TOBFmBK6ot2liNI1oHd9 Sv4w6VldK5mvMC+8XRsC4Ktc/f8YT25UuRUSFPZK5CfQqozWhgvVrGAYBgsFWlICXnGV vS8tfrVHLJxRg5ZM/S5esJpTDDns0wC7okPp2GQkBKiRBfWugqBmPtwY4AAdASHBu/Lz xXjJiXK0Cxm6nRcrY7qNpAk3NosP/XtZpoZt5Xe/Hz+cGERirmD8Sjaud8aANWfewO61 xxAA== X-Gm-Message-State: ACgBeo0x+AAlct7w8DB7FSa7m+KnCxyaYiFMZqHKbYNL8Ai4Zv8wTfvj PArVvUcyW6QZmStY/5WU+Vxg/R8I3Onu1Q== X-Google-Smtp-Source: AA6agR7UdCWBAoo5wD4WWfw+Othi4XqbCiRksVoq2LC9xqTpNUjxlQsO+0aq8xbT2JBaBWz+SXM3pg== X-Received: by 2002:a05:6870:1712:b0:10e:e9e1:8a80 with SMTP id h18-20020a056870171200b0010ee9e18a80mr1496175oae.247.1659375934900; Mon, 01 Aug 2022 10:45:34 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:1e34:d541:f280:24d8:b04c? ([2804:431:c7cb:1e34:d541:f280:24d8:b04c]) by smtp.gmail.com with ESMTPSA id 13-20020aca100d000000b00339eb701c6csm2529512oiq.41.2022.08.01.10.45.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 01 Aug 2022 10:45:34 -0700 (PDT) Message-ID: Date: Mon, 1 Aug 2022 14:45:31 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.1.0 Subject: Re: [PATCH v2] stdlib: Simplify arc4random_uniform Content-Language: en-US From: Adhemerval Zanella Netto To: libc-alpha@sourceware.org, Florian Weimer , Carlos O'Donell , Yann Droneaud , Paul Eggert References: <20220729123211.876374-1-adhemerval.zanella@linaro.org> Organization: Linaro In-Reply-To: <20220729123211.876374-1-adhemerval.zanella@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Aug 2022 17:45:38 -0000 Based on previous comments I will commit this shortly. On 29/07/22 09:32, Adhemerval Zanella wrote: > It uses the bitmask with rejection [1], which calculates a mask > being the lowest power of two bounding the request upper bound, > successively queries new random values, and rejects values > outside the requested range. > > Performance-wise, there is no much gain in trying to conserve > bits since arc4random is wrapper on getrandom syscall. It should > be cheaper to just query a uint32_t value. The algorithm also > avoids modulo and divide operations, which might be costly > depending of the architecture. > > [1] https://www.pcg-random.org/posts/bounded-rands.html > > Reviewed-by: Yann Droneaud > --- > v2: Fixed typos in commit message and simplify loop. > --- > stdlib/arc4random_uniform.c | 129 +++++++++--------------------------- > 1 file changed, 30 insertions(+), 99 deletions(-) > > diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c > index 1326dfa593..5aa98d1c13 100644 > --- a/stdlib/arc4random_uniform.c > +++ b/stdlib/arc4random_uniform.c > @@ -17,38 +17,19 @@ > License along with the GNU C Library; if not, see > . */ > > -#include > -#include > #include > #include > > -/* Return the number of bytes which cover values up to the limit. */ > -__attribute__ ((const)) > -static uint32_t > -byte_count (uint32_t n) > -{ > - if (n < (1U << 8)) > - return 1; > - else if (n < (1U << 16)) > - return 2; > - else if (n < (1U << 24)) > - return 3; > - else > - return 4; > -} > +/* Return a uniformly distributed random number less than N. The algorithm > + calculates a mask being the lowest power of two bounding the upper bound > + N, successively queries new random values, and rejects values outside of > + the request range. > > -/* Fill the lower bits of the result with randomness, according to the > - number of bytes requested. */ > -static void > -random_bytes (uint32_t *result, uint32_t byte_count) > -{ > - *result = 0; > - unsigned char *ptr = (unsigned char *) result; > - if (__BYTE_ORDER == __BIG_ENDIAN) > - ptr += 4 - byte_count; > - __arc4random_buf (ptr, byte_count); > -} > + For reject values, it also tries if the remaining entropy could fit on > + the asked range after range adjustment. > > + The algorithm avoids modulo and divide operations, which might be costly > + depending on the architecture. */ > uint32_t > __arc4random_uniform (uint32_t n) > { > @@ -57,83 +38,33 @@ __arc4random_uniform (uint32_t n) > only possible result for limit 1. */ > return 0; > > - /* The bits variable serves as a source for bits. Prefetch the > - minimum number of bytes needed. */ > - uint32_t count = byte_count (n); > - uint32_t bits_length = count * CHAR_BIT; > - uint32_t bits; > - random_bytes (&bits, count); > - > /* Powers of two are easy. */ > if (powerof2 (n)) > - return bits & (n - 1); > - > - /* The general case. This algorithm follows Jérémie Lumbroso, > - Optimal Discrete Uniform Generation from Coin Flips, and > - Applications (2013), who credits Donald E. Knuth and Andrew > - C. Yao, The complexity of nonuniform random number generation > - (1976), for solving the general case. > + return __arc4random () & (n - 1); > > - The implementation below unrolls the initialization stage of the > - loop, where v is less than n. */ > + /* mask is the smallest power of 2 minus 1 number larger than n. */ > + int z = __builtin_clz (n); > + uint32_t mask = ~UINT32_C(0) >> z; > + int bits = CHAR_BIT * sizeof (uint32_t) - z; > > - /* Use 64-bit variables even though the intermediate results are > - never larger than 33 bits. This ensures the code is easier to > - compile on 64-bit architectures. */ > - uint64_t v; > - uint64_t c; > - > - /* Initialize v and c. v is the smallest power of 2 which is larger > - than n.*/ > - { > - uint32_t log2p1 = 32 - __builtin_clz (n); > - v = 1ULL << log2p1; > - c = bits & (v - 1); > - bits >>= log2p1; > - bits_length -= log2p1; > - } > - > - /* At the start of the loop, c is uniformly distributed within the > - half-open interval [0, v), and v < 2n < 2**33. */ > - while (true) > + while (1) > { > - if (v >= n) > - { > - /* If the candidate is less than n, accept it. */ > - if (c < n) > - /* c is uniformly distributed on [0, n). */ > - return c; > - else > - { > - /* c is uniformly distributed on [n, v). */ > - v -= n; > - c -= n; > - /* The distribution was shifted, so c is uniformly > - distributed on [0, v) again. */ > - } > - } > - /* v < n here. */ > - > - /* Replenish the bit source if necessary. */ > - if (bits_length == 0) > - { > - /* Overwrite the least significant byte. */ > - random_bytes (&bits, 1); > - bits_length = CHAR_BIT; > - } > - > - /* Double the range. No overflow because v < n < 2**32. */ > - v *= 2; > - /* v < 2n here. */ > - > - /* Extract a bit and append it to c. c remains less than v and > - thus 2**33. */ > - c = (c << 1) | (bits & 1); > - bits >>= 1; > - --bits_length; > - > - /* At this point, c is uniformly distributed on [0, v) again, > - and v < 2n < 2**33. */ > + uint32_t value = __arc4random (); > + > + /* Return if the lower power of 2 minus 1 satisfy the condition. */ > + uint32_t r = value & mask; > + if (r < n) > + return r; > + > + /* Otherwise check if remaining bits of entropy provides fits in the > + bound. */ > + for (int bits_left = z; bits_left >= bits; bits_left -= bits) > + { > + value >>= bits; > + r = value & mask; > + if (r < n) > + return r; > + } > } > } > libc_hidden_def (__arc4random_uniform)