From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by sourceware.org (Postfix) with ESMTPS id 8951A3857BA1 for ; Tue, 12 Jul 2022 17:21:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8951A3857BA1 Received: by mail-oi1-x231.google.com with SMTP id r82so11301899oig.2 for ; Tue, 12 Jul 2022 10:21:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=fYT4wJRwqy27QzFxyerCcXzHBG8ZGsSOJ2wz+4nBu18=; b=qLMdHhs9H9l1kyqy6npBs6QuLD+C7w3zOn3ym3GxLdnWtbeh3mnDVrfllgQFXlV9YN d9vylB7vHo6H/PzYzfIXjoM55y/4Tpq/yg1WEwAI0CsDzG71/LTO1zG6+h1h6uCUdssa Rwj/70JAgaSaOYQ2awpDpi7l+zg76bLQFdQ6tpnsfKveyJ9syY6Wwvx1ES6601Onau+I MI3My64d1CQEL2vhRkbXk6bvoTJk69d8DUHqEAuaMz/eX71wDcFWGjilIzTNe5vnrbGO BOx2qBvCiAdpUDRdJT3Bc/m3NH3Led1zm5J9cfONAOpn3GfXtHFU1Q+OxV5QTe4q7r2/ aQMg== X-Gm-Message-State: AJIora+yvY23KeVki4Tq3ef66hM6rlnrz0OzYijnhKeLxS3yCmGRzZuQ mR2WwqTzC9OCYEA94NfnOuwVyw== X-Google-Smtp-Source: AGRyM1ug2hmKIl6D7/Um0uKd+5uR2b0FKkFBXI1GIS46OyLD+f5zlpthb5rz+DFLgfzMJC5PiPjazQ== X-Received: by 2002:a05:6808:238c:b0:335:e298:f2c0 with SMTP id bp12-20020a056808238c00b00335e298f2c0mr2533982oib.191.1657646501803; Tue, 12 Jul 2022 10:21:41 -0700 (PDT) Received: from [12.18.8.156] ([201.46.27.6]) by smtp.gmail.com with ESMTPSA id s17-20020a056830149100b0061c564a83ebsm1500222otq.19.2022.07.12.10.21.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Jul 2022 10:21:41 -0700 (PDT) Message-ID: <271ea8e3-36b9-ada1-5f9b-0b7f1affe598@linaro.org> Date: Tue, 12 Jul 2022 14:21:39 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.0.2 Subject: Re: [PATCH v8 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Content-Language: en-US To: Florian Weimer Cc: libc-alpha@sourceware.org, Yann Droneaud References: <20220629213428.3065430-1-adhemerval.zanella@linaro.org> <20220629213428.3065430-2-adhemerval.zanella@linaro.org> <871quqvhch.fsf@oldenburg.str.redhat.com> <4172a764-e463-b357-fd1e-b6151a239775@linaro.org> <87bktus157.fsf@oldenburg.str.redhat.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <87bktus157.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2022 17:21:45 -0000 On 12/07/22 14:15, Florian Weimer wrote: > * Adhemerval Zanella Netto: > >>>> +/* arc4random keeps two counters: 'have' is the current valid bytes not yet >>>> + consumed in 'buf' while 'count' is the maximum number of bytes until a >>>> + reseed. >>>> + >>>> + Both the initial seed and reseed try to obtain entropy from the kernel >>>> + and abort the process if none could be obtained. >>>> + >>>> + The state 'buf' improves the usage of the cipher calls, allowing to call >>>> + optimized implementations (if the architecture provides it) and optimize >>>> + arc4random calls (since only multiple calls it will encrypt the next >>>> + block). */ > >>> I don't understand the “since only multiple calls it will encrypt the >>> next block” part. >> >> I changed to 'and minimize function call overhead'. Using the generic >> implementation, a 8 times the chacha20 blocks buffer shows about 2x more >> throughput um aarch64. >> >> A buffer with 4x the chacha20 block size shows slight less performance, >> so one option might to make the buffer sizes arch-specific (since AVX2, >> and potentially AVX512 requires large block size for the arch-specific >> implementations). > > Ah, that makes sense. I think the quoted part is just a bit garbled and > needs some polishing. > >>>> +/* Reinit the thread context by reseeding the cipher state with kernel >>>> + entropy. */ >>>> +static void >>>> +arc4random_check_stir (struct arc4random_state_t *state, size_t len) >>> Could you add a comment describing the len parameter? >> >> I changed to: >> >> /* Check if the thread context STATE should be reseed with kernel entropy >> depending of requested LEN bytes. If there is less than requested, >> the state is either initialized or reseed, otherwise the internal >> counter subtract the requested lenght. */ > > “reseeded” Ack. > >>> Why not simply call __arc4random_buf? If you want to retain the >>> optimization, turn the implementation of __arc4random_buf into an inline >>> function and call it here and from __arc4random_buf. >> >> I actually tried in some interation and I recall that it yield some >> worse throughput. I just tested again and it holds true, on >> an aarch64 system current approach with generic implementation yields >> 290 MB/s while calling arc4random_buf shows 172 MB/s. >> >> I am trying to decompose the function to eliminate the need of the >> loop (which I think compiler can't optimize away for arc4random) >> but I don't think it would be simpler than open code the logic >> on both functions. > > Hmm, this isn't great, but I see why you are doing it. Agreed, but performance difference is significant so I think it worth the code duplication in this case.