From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20
Date: Thu, 14 Apr 2022 15:12:28 -0300 [thread overview]
Message-ID: <a3845d9c-1182-5b1c-1113-89d16c2e89db@linaro.org> (raw)
In-Reply-To: <CAFUsyfLFsWN=bPabMZonSAg0TX_92btt5vfA5FPPKDc-jUxw9g@mail.gmail.com>
On 14/04/2022 14:20, Noah Goldstein wrote:
> On Thu, Apr 14, 2022 at 12:17 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 13/04/2022 20:04, Noah Goldstein wrote:
>>> On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>>
>>>> + .text
>>>
>>> section avx2
>>>
>>
>> Ack, I changed to '.section .text.avx2, "ax", @progbits'.
>>
>>>> + .align 32
>>>> +chacha20_data:
>>>> +L(shuf_rol16):
>>>> + .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
>>>> +L(shuf_rol8):
>>>> + .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
>>>> +L(inc_counter):
>>>> + .byte 0,1,2,3,4,5,6,7
>>>> +L(unsigned_cmp):
>>>> + .long 0x80000000
>>>> +
>>>> +ENTRY (__chacha20_avx2_blocks8)
>>>> + /* input:
>>>> + * %rdi: input
>>>> + * %rsi: dst
>>>> + * %rdx: src
>>>> + * %rcx: nblks (multiple of 8)
>>>> + */
>>>> + vzeroupper;
>>>
>>> vzeroupper needs to be replaced with VZEROUPPER_RETURN
>>> and we need a transaction safe version unless this can never
>>> be called during a transaction.
>>
>> I think you meant VZEROUPPER here (VZEROUPPER_RETURN seems to trigger
>> test case failures). What do you mean by a 'transaction safe version'?
>> Ax extra __chacha20_avx2_blocks8 implementation to handle it? Or disable
>> it if RTM is enabled?
>
> For now you can just update the cpufeature check to do ssse3 if RTM is enabled.
Right, I will do it.
>
>>
>>>> +
>>>> + /* clear the used vector registers and stack */
>>>> + vpxor X0, X0, X0;
>>>> + vmovdqa X0, (STACK_VEC_X12)(%rsp);
>>>> + vmovdqa X0, (STACK_VEC_X13)(%rsp);
>>>> + vmovdqa X0, (STACK_TMP)(%rsp);
>>>> + vmovdqa X0, (STACK_TMP1)(%rsp);
>>>> + vzeroall;
>>>
>>> Do you need vzeroall?
>>> Why not vzeroupper? Is it a security concern to leave info in the xmm pieces?
>>
>> I would assume, since it is on the original libgrcypt optimization. As
>> for the ssse3 version, I am not sure if we really need that level of
>> hardening, but it would be good to have the initial revision as close
>> as possible from libgcrypt.
>
> Got it.
>
>>
>>>
>>>
>>>> +
>>>> + /* eax zeroed by round loop. */
>>>> + leave;
>>>> + cfi_adjust_cfa_offset(-8)
>>>> + cfi_def_cfa_register(%rsp);
>>>> + ret;
>>>> + int3;
>>>
>>> Why do we need int3 here?
>>
>> I think the ssse3 applies here as well.
>>
>>>> +END(__chacha20_avx2_blocks8)
>>>> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
>>>> index 37a4fdfb1f..7e9e7755f3 100644
>>>> --- a/sysdeps/x86_64/chacha20_arch.h
>>>> +++ b/sysdeps/x86_64/chacha20_arch.h
>>>> @@ -22,11 +22,25 @@
>>>>
>>>> unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst,
>>>> const uint8_t *src, size_t nblks);
>>>> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
>>>> + const uint8_t *src, size_t nblks);
>>>>
>>>> static inline void
>>>> chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src,
>>>> size_t bytes)
>>>> {
>>>> + const struct cpu_features* cpu_features = __get_cpu_features ();
>>>
>>> Can we do this with an ifunc and take the cpufeature check off the critical
>>> path?
>>
>> Ditto.
>>
>>>> +
>>>> + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && bytes >= CHACHA20_BLOCK_SIZE * 8)
>>>> + {
>>>> + size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
>>>> + nblocks -= nblocks % 8;
>>>> + __chacha20_avx2_blocks8 (state->ctx, dst, src, nblocks);
>>>> + bytes -= nblocks * CHACHA20_BLOCK_SIZE;
>>>> + dst += nblocks * CHACHA20_BLOCK_SIZE;
>>>> + src += nblocks * CHACHA20_BLOCK_SIZE;
>>>> + }
>>>> +
>>>> if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4)
>>>> {
>>>> size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
>>>> --
>>>> 2.32.0
>>>>
>>>
>>> Do you want optimization comments or do that later?
>>
>> Ideally I would like to check if the proposed arc4random implementation
>> is what we want (with current approach of using atfork handlers and the
>> key reschedule). The cipher itself it not the utmost important in the
>> sense it is transparent to user and we can eventually replace it if there
>> any issue or attack to ChaCha20. Initially I won't add any arch-specific
>> optimization, but since libgcrypt provides some that fits on the current
>> approach I though it would be a nice thing to have.
>>
>> For optimization comments it would be good to sync with libgcrypt as well,
>> I think the project will be interested in any performance improvement
>> you might have for the chacha implementations.
> Okay, I'll probably take a stab at this in the not too distant future.
Thanks.
next prev parent reply other threads:[~2022-04-14 18:12 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 20:23 [PATCH 0/7] Add arc4random support Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 2/7] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-14 18:01 ` Noah Goldstein
2022-04-13 20:23 ` [PATCH 3/7] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-14 19:17 ` Noah Goldstein
2022-04-14 19:48 ` Adhemerval Zanella
2022-04-14 20:33 ` Noah Goldstein
2022-04-14 20:48 ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Adhemerval Zanella
2022-04-13 23:12 ` Noah Goldstein
2022-04-14 17:03 ` Adhemerval Zanella
2022-04-14 17:10 ` Noah Goldstein
2022-04-14 17:18 ` Adhemerval Zanella
2022-04-14 17:22 ` Noah Goldstein
2022-04-14 18:25 ` Adhemerval Zanella
2022-04-14 17:17 ` Noah Goldstein
2022-04-14 18:11 ` Adhemerval Zanella
2022-04-14 19:25 ` Noah Goldstein
2022-04-14 19:40 ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 5/7] x86: Add AVX2 " Adhemerval Zanella
2022-04-13 23:04 ` Noah Goldstein
2022-04-14 17:16 ` Adhemerval Zanella
2022-04-14 17:20 ` Noah Goldstein
2022-04-14 18:12 ` Adhemerval Zanella [this message]
2022-04-13 20:24 ` [PATCH 6/7] aarch64: Add " Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 7/7] powerpc64: " Adhemerval Zanella
2022-04-14 7:36 ` [PATCH 0/7] Add arc4random support Yann Droneaud
2022-04-14 18:39 ` Adhemerval Zanella
2022-04-14 18:43 ` Noah Goldstein
2022-04-15 10:22 ` Yann Droneaud
2022-04-14 11:49 ` Cristian Rodríguez
2022-04-14 19:26 ` Adhemerval Zanella
2022-04-14 20:36 ` Noah Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3845d9c-1182-5b1c-1113-89d16c2e89db@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=goldstein.w.n@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).