From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x30.google.com (mail-oa1-x30.google.com [IPv6:2001:4860:4864:20::30]) by sourceware.org (Postfix) with ESMTPS id C43C53858D28 for ; Mon, 25 Apr 2022 12:20:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C43C53858D28 Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-e5ca5c580fso15822948fac.3 for ; Mon, 25 Apr 2022 05:20:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:cc:references:in-reply-to :content-transfer-encoding; bh=Vo9L7ffZutmrrbrjlZzbRoeXxwxBb+AUn1JYiFVlXPg=; b=yYQ+2VZimMH8P83V+37i/Cby/eGMfm4z8wJMBsQ1VhdaCDmjC/8F1MeMbwb1D3+8h7 nueTy66duJ8dNHiCM4uUciuHiCuH+2FDrxo3vTCF+N2z3Xbtl/oK+HOWwD3jFLoCKUop jgbIPwBfiSYae9Yv6NRckQL3RiUKHLfzo19dl6dWg0O8JT521i9zXisokguAZObD4945 hvuebNI9Gv6oC1llPgoLabHoEHXgVrSQh8CS+wHLXjMAFzzkydZE7P8X57bwkbvtC75g RsNCRvPeu7XQHXyRBMOwFgfGRhIVJK89+1J+on4gF8VcYp5b6OdyutAovpXpiluTlxIk ZnXQ== X-Gm-Message-State: AOAM530jP45OWE5qpoJgMYT/27yySv5UKaSBMfMFpUurjN4V+jQwzrw3 19eXaABp17UiBntod7rdiYySSw== X-Google-Smtp-Source: ABdhPJzm4ivRFMVHjDXOtMAz1gNs6P0FBpaiGjDkWM3oCU1xj52lwJWZaAsR9RoYmZB0bn4ueHJjpg== X-Received: by 2002:a05:6870:f152:b0:e9:45eb:3564 with SMTP id l18-20020a056870f15200b000e945eb3564mr1151911oac.65.1650889217994; Mon, 25 Apr 2022 05:20:17 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:4214:b4dd:3339:98d6:1ec0? ([2804:431:c7ca:4214:b4dd:3339:98d6:1ec0]) by smtp.gmail.com with ESMTPSA id fz13-20020a056870ed8d00b000e593f1f26fsm3134438oab.18.2022.04.25.05.20.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 25 Apr 2022 05:20:17 -0700 (PDT) Message-ID: <245c297d-3f1d-d938-2071-e9c7720aabd8@linaro.org> Date: Mon, 25 Apr 2022 09:20:15 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v3 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Content-Language: en-US From: Adhemerval Zanella To: Yann Droneaud , libc-alpha@sourceware.org Cc: Florian Weimer References: <20220419212812.2688764-1-adhemerval.zanella@linaro.org> <20220419212812.2688764-2-adhemerval.zanella@linaro.org> <9b6f8e87-9226-7828-3569-cff0e3575f9a@opteya.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Apr 2022 12:20:22 -0000 On 25/04/2022 09:15, Adhemerval Zanella wrote: > > > On 22/04/2022 10:54, Yann Droneaud wrote: >> Le 19/04/2022 à 23:28, Adhemerval Zanella via Libc-alpha a écrit : >>> The implementation is based on scalar Chacha20, with global cache and >>> locking.  It uses getrandom or /dev/urandom as fallback to get the >>> initial entropy, and reseeds the internal state on every 16MB of >>> consumed buffer. >>> >>> It maintains an internal buffer which consumes at maximum one page on >>> most systems (assuming minimum of 4k pages).  The internal buf optimizes >>> the cipher encrypt calls, by amortize arc4random calls (where both >>> function call and locks cost are the dominating factor). >>> >>> The ChaCha20 implementation is based on the RFC8439 [1], with last >>> step that XOR with the input omited.  Since the input stream will either >>> zero bytes (initial state) or the PRNG output itself this step does not >>> add any extra entropy. >> >> >> This can also state the implementation is following OpenBSD arc4random current implementation. >> >> > > Agree, it is worth to add it. > >>> + >>> +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is the >>> +   current valid bytes not yet consumed in 'buf', while 'count' is the maximum >>> +   number of bytes until a reseed. >>> + >>> +   Both the initial seed an reseed tries to obtain entropy from the kernel >> >> an ->  and > > Ack. > >>> + >>> +static void >>> +arc4random_rekey (uint8_t *rnd, size_t rndlen) >>> +{ >>> +  memset (state->buf, 0, sizeof state->buf); >> >> There's no need to clear buf as call to chacha20_crypt() will overwrite it (since it doesn't XOR with it anymore). >> >> See https://github.com/openbsd/src/blob/master/lib/libc/crypt/arc4random.c#L121 > > Ack, I have removed it. > > >>> + >>> +      /* Extract a bit and append it to c.  c remains less than v and >>> +         thus 2**33.  */ >>> +      c = (c << 1) | (bits & 1); >>> +      bits >>= 1; >>> +      --bits_length; >>> + >>> +      /* At this point, c is uniformly distributed on [0, v) again, >>> +         and v < 2n < 2**33.  */ >>> +    } >> >> I'm not familiar with this method. >> >> It's not one reviewed at https://www.pcg-random.org/posts/bounded-rands.html >> >> In this patch I used what's called by PCG author,the "Bitmask with Rejection (Unbiased) — Apple's Method" >> >> https://github.com/Parrot-Developers/libfutils/commit/9dc7243ae2f2059b4590a702be2ca9c03578067f >> >> I like it because it doesn't uses modulo at all :) >> >> But the OpenBSD's arc4random_uniform() is even more simple in term of C code. >> > > You can find the reference paper on arxiv [1]. The main advantage of this > method is the that the unit of randomness is not the uniform random variable > (uint32_t), but a random bit. It optimizes the internal buffer sampling by > initially consuming a 32-bit random variable and then sampling byte per byte. > Depending of the upper bound requested, it might lead to better CPU utilization. > > From the article: > > But unexpectedly, it turns out that the extra buffering inherent in consuming > randomness random-bit-by-random-bit, although time consuming, is more than > compensated by the increased efficiency in using random bits compared with most > common methods. > > It is specially true if you consider that both chacha20 block generation and > getting kernel entropy (through either getrandom or /dev/urandom) are way > more time consuming than bit twiddling. > > [1] https://arxiv.org/pdf/1304.1916.pdf > And you can see that this method does not use modulo as well. It does use arc4random_buf internally, what might add some overhead. One possible optimization could to add a internal function to consume only one byte, but I am not sure if this really pays off.