From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-90816-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 22236 invoked by alias); 5 Mar 2018 19:10:42 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 22171 invoked by uid 89); 5 Mar 2018 19:10:40 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=adoption
X-HELO: mail-yb0-f172.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:subject:to:references:from:message-id:date
         :user-agent:mime-version:in-reply-to:content-language
         :content-transfer-encoding;
        bh=X5al+gN46V71IxjaNWtWo68ErQhhxhQtZ7cRG9Msrs8=;
        b=OMrXrwoGSnHvCf5M9NIndvJ29CtpExAHmd/OPcUi7+pOaW6KUoKfUXUlHNG52DEaGZ
         ySnzW3KxKURhH1VV2TjCbEef5sCbMH4ruDh+vWR6JYP56HVWCvDbubrboNC76uOvTslp
         Y+NnWXcxaDulGLO4fDVoL4W55m2f7QN9QE7qDD79mw5DyMD4yCRCR8PZ4yq42hURFM3f
         Q51KrQECD0vT674XLnbxHun50ty8GOEpfQSrHmRoS9RYENgWP8kIYzbOcL5x7OmxjFXz
         5cFdyqJ9FYg2nwNLkdS7xTArb8PN5orLmbH0XOgivj0/NJUqV1eiHTd26M18FNdsNesd
         fbqg==
X-Gm-Message-State: AElRT7FCtTML8qoeD8fkcXaOpfdKw6N2srqjYSIie8MY37sXsNkbUzKk
	pPoGUvKzrGo3N1FJAW8V02W9N28AqMM=
X-Google-Smtp-Source: AG47ELtaAEW/rJMft2F6pd8JCZ4qy+NKUkPhkID+byoq1NCaqy6vfBHg88ND9PPGqxmTMDEIGmcp3w==
X-Received: by 2002:a25:58c1:: with SMTP id m184-v6mr9732988ybb.466.1520277036194;
        Mon, 05 Mar 2018 11:10:36 -0800 (PST)
Subject: Re: [PATCH] elf: Add AES-128 implementation for arc4random
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
References: <20180302125302.5CF0F4045458E@oldenburg.str.redhat.com>
 <afe2fe20-8e3f-a7d2-971a-6a08778e92b5@linaro.org>
 <605fe685-1ca3-ec81-0a90-6dd95c26383b@redhat.com>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Message-ID: <696f651f-0107-072b-0a43-4c855d84b41f@linaro.org>
Date: Mon, 05 Mar 2018 19:10:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <605fe685-1ca3-ec81-0a90-6dd95c26383b@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-SW-Source: 2018-03/txt/msg00123.txt.bz2


On 05/03/2018 11:48, Florian Weimer wrote:
> On 03/02/2018 08:27 PM, Adhemerval Zanella wrote:
>>
>>
>> On 02/03/2018 09:53, Florian Weimer wrote:
>>> This commit imports the AES-128 implementation from libgcrypt.
>>>
>>> This code has to reside in ld.so because it will be used to
>>> initialize the stack protector cookie and the pointer guard
>>> from the AT_RANDOM variable.
>>>
>>> AES-128 was chosen as the cryptographic primitive because hardware
>>> support for AES-128 is much more widespread than for SHA-1 or SHA-256.
>>> This means that we can add hardware acceleration for arc4random for
>>> a larger number of systems, as a subsequent optimization.
>>
>> I noted other system (*BSD, Linux kernel, etc.) are using ChaCha20 instead
>> of AES-128 for both arc4random and /dev/{u}random, but I don't have much
> 
> FreeBSD still uses RC4 for arc4random for some reason.
> 

Only for userspace still and I think what it is holding it up ChaCha20 
adoption is how to handle the state after fork [1]. Kernel interface uses
ChaCha20 [2].

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=182610
[2] https://github.com/freebsd/freebsd/blob/bd86cc8ebf468975220088041fbe0b3be5649310/sys/libkern/arc4random.c

>> information why exactly ChaCha20 was picked instead.Â  Checking some
>> discussion why ChaCha20 is preferable [1] it seems is usually faster on
>> hardware without specialized instructions and less susceptible to cache
>> timing attacks. However, cryptoanalysis is not really forte, so I just
>> curious why we should do something different than others for arc4random.
>>
>> [1] https://crypto.stackexchange.com/questions/34455/whats-the-appeal-of-using-chacha20-instead-of-aes
> 
> The advantage of ChaCha20 is that the key schedule is very cheap.Â  This means that you can feed back the output from the generator and use it for the next block (this is called âbacktracking resistanceâ).Â  It's not really advisable to do this with a software implementation of AES-128 for performance reasons.
> 
> The downside is that thisÂ Â  Xâ := fâ¿(Xâ) construction risks into running a small(ish) cycle after many blocks.Â  Therefore, the implementation in libbsd feeds back not just the 256 key bits, but also 64 bits for the initial vector, probably hoping that the 320 bits make it sufficiently unlikely that the initial run until the first repeated block is shorter than 2**64 iterations or so.
> 
> If you encrypt a counter using AES-128, you do not have this problem because the encrypted blocks are all distinct, but this means there is a generic discriminator because expected block repeats after 2**64 or so blocks (due to the birthday paradox) simply do not happen.
> 
> Another advantage of encrypted counter approach is that you have very little per-thread state.Â  Basically just the counter and a key stream discriminator (see pthread_thread_number_np), although for some coprocessor implementations, it may be beneficial to create more than a single block per iteration.

Right thanks for the thoughtfully explanation, this seems a winner factor 
comparing to chachar20 one.

> 
> The backtracking protection in libbsd still looks somewhat expensive, so libbsd generates 1024 output bytes for each feedback step.Â  40 bytes are fed back, the rest is returned to the application piece by piece.Â  This buffer really has to be thread-local if we want an implementation which scales, and using 1024 bytes for this seems to be a bit over top.Â  We could probably do with fewer bytes than that (40 + X), but it will substantially reduce generator throughput.

Linux implementation seems to be doing the later, as least on 'extract_crng_user'
function by feeding 'crng_backtrack_protect' with at most 64 bytes (minus the 
extracted on for user).  I am not sure if this is suffice for backtracking 
protection, but it seems feasible for a per-thread state.

> 
> Performance-wise, on current Intel CPUs with AES support, the AES-128 encrypted counter approach will provide a throughput of around 3 gigabyte per second, with 80 bytes of per-thread state.Â  I expect that the ChaCha20 approach in libbsd will reach this level of performance only with a per-thread large buffer, such as the 1064 bytes used in libbsd, which should give around 2.75 gigabyte per second.Â  With 396 bytes of per-thread state, the predicted performance is 1.1 gigabyte per second, and with 104 bytes, it is 0.24 gigabyte per second.

My main concern is to select a algorithm because we can get improved numbers on
certain platform while on other (which does not have hardware acceleration) a
different approach would be more suitable both in performance and security.

However as you noted, if the idea is to add scalability I think per-thread state
should be considered as well and looks like AES 128 in software mode should be
good enough (and with my qsort refactor patches and recent thread about O(n^2)
worst case being a security issue I think it would be good to have fast per
thread rng). Do you have any numbers for this software implementation?

> 
> The nominal security strength of ChaCha20 is higher than that of AES-128 (256 bits vs less than 128 bits), but this is for the cipher itself, not for the generators derived from it.Â  I'm not aware of any reviews of the actual generators.
> 
> So if we want backtracking protection, we'd probably have to go with the 396-byte ChaCha20 approach (maybe after recovering the TLS space occupied by _res).Â  Otherwise, AES-128 will be the better choice for a lot of users (who have access to hardware with AES-128 acceleration).
> 
> Unfortunately, maintaining both approaches has quite a bit of overhead because they are so different.

I agree and I think we should avoid it as well.