Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20
Date: Thu, 14 Apr 2022 15:12:28 -0300	[thread overview]
Message-ID: <a3845d9c-1182-5b1c-1113-89d16c2e89db@linaro.org> (raw)
In-Reply-To: <CAFUsyfLFsWN=bPabMZonSAg0TX_92btt5vfA5FPPKDc-jUxw9g@mail.gmail.com>



On 14/04/2022 14:20, Noah Goldstein wrote:
> On Thu, Apr 14, 2022 at 12:17 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 13/04/2022 20:04, Noah Goldstein wrote:
>>> On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>>
>>>> +       .text
>>>
>>> section avx2
>>>
>>
>> Ack, I changed to '.section .text.avx2, "ax", @progbits'.
>>
>>>> +       .align 32
>>>> +chacha20_data:
>>>> +L(shuf_rol16):
>>>> +       .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
>>>> +L(shuf_rol8):
>>>> +       .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
>>>> +L(inc_counter):
>>>> +       .byte 0,1,2,3,4,5,6,7
>>>> +L(unsigned_cmp):
>>>> +       .long 0x80000000
>>>> +
>>>> +ENTRY (__chacha20_avx2_blocks8)
>>>> +       /* input:
>>>> +        *      %rdi: input
>>>> +        *      %rsi: dst
>>>> +        *      %rdx: src
>>>> +        *      %rcx: nblks (multiple of 8)
>>>> +        */
>>>> +       vzeroupper;
>>>
>>> vzeroupper needs to be replaced with VZEROUPPER_RETURN
>>> and we need a transaction safe version unless this can never
>>> be called during a transaction.
>>
>> I think you meant VZEROUPPER here (VZEROUPPER_RETURN seems to trigger
>> test case failures). What do you mean by a 'transaction safe version'?
>> Ax extra __chacha20_avx2_blocks8 implementation to handle it? Or disable
>> it if RTM is enabled?
> 
> For now you can just update the cpufeature check to do ssse3 if RTM is enabled.

Right, I will do it.

> 
>>
>>>> +
>>>> +       /* clear the used vector registers and stack */
>>>> +       vpxor X0, X0, X0;
>>>> +       vmovdqa X0, (STACK_VEC_X12)(%rsp);
>>>> +       vmovdqa X0, (STACK_VEC_X13)(%rsp);
>>>> +       vmovdqa X0, (STACK_TMP)(%rsp);
>>>> +       vmovdqa X0, (STACK_TMP1)(%rsp);
>>>> +       vzeroall;
>>>
>>> Do you need vzeroall?
>>> Why not vzeroupper? Is it a security concern to leave info in the xmm pieces?
>>
>> I would assume, since it is on the original libgrcypt optimization.  As
>> for the ssse3 version, I am not sure if we really need that level of
>> hardening, but it would be good to have the initial revision as close
>> as possible from libgcrypt.
> 
> Got it.
> 
>>
>>>
>>>
>>>> +
>>>> +       /* eax zeroed by round loop. */
>>>> +       leave;
>>>> +       cfi_adjust_cfa_offset(-8)
>>>> +       cfi_def_cfa_register(%rsp);
>>>> +       ret;
>>>> +       int3;
>>>
>>> Why do we need int3 here?
>>
>> I think the ssse3 applies here as well.
>>
>>>> +END(__chacha20_avx2_blocks8)
>>>> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
>>>> index 37a4fdfb1f..7e9e7755f3 100644
>>>> --- a/sysdeps/x86_64/chacha20_arch.h
>>>> +++ b/sysdeps/x86_64/chacha20_arch.h
>>>> @@ -22,11 +22,25 @@
>>>>
>>>>  unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst,
>>>>                                        const uint8_t *src, size_t nblks);
>>>> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
>>>> +                                     const uint8_t *src, size_t nblks);
>>>>
>>>>  static inline void
>>>>  chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src,
>>>>                 size_t bytes)
>>>>  {
>>>> +  const struct cpu_features* cpu_features = __get_cpu_features ();
>>>
>>> Can we do this with an ifunc and take the cpufeature check off the critical
>>> path?
>>
>> Ditto.
>>
>>>> +
>>>> +  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && bytes >= CHACHA20_BLOCK_SIZE * 8)
>>>> +    {
>>>> +      size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
>>>> +      nblocks -= nblocks % 8;
>>>> +      __chacha20_avx2_blocks8 (state->ctx, dst, src, nblocks);
>>>> +      bytes -= nblocks * CHACHA20_BLOCK_SIZE;
>>>> +      dst += nblocks * CHACHA20_BLOCK_SIZE;
>>>> +      src += nblocks * CHACHA20_BLOCK_SIZE;
>>>> +    }
>>>> +
>>>>    if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4)
>>>>      {
>>>>        size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
>>>> --
>>>> 2.32.0
>>>>
>>>
>>> Do you want optimization comments or do that later?
>>
>> Ideally I would like to check if the proposed arc4random implementation
>> is what we want (with current approach of using atfork handlers and the
>> key reschedule).  The cipher itself it not the utmost important in the
>> sense it is transparent to user and we can eventually replace it if there
>> any issue or attack to ChaCha20.  Initially I won't add any arch-specific
>> optimization, but since libgcrypt provides some that fits on the current
>> approach I though it would be a nice thing to have.
>>
>> For optimization comments it would be good to sync with libgcrypt as well,
>> I think the project will be interested in any performance improvement
>> you might have for the chacha implementations.
> Okay, I'll probably take a stab at this in the not too distant future.

Thanks.

next prev parent reply	other threads:[~2022-04-14 18:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 20:23 [PATCH 0/7] Add arc4random support Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 2/7] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-14 18:01   ` Noah Goldstein
2022-04-13 20:23 ` [PATCH 3/7] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-14 19:17   ` Noah Goldstein
2022-04-14 19:48     ` Adhemerval Zanella
2022-04-14 20:33       ` Noah Goldstein
2022-04-14 20:48         ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Adhemerval Zanella
2022-04-13 23:12   ` Noah Goldstein
2022-04-14 17:03     ` Adhemerval Zanella
2022-04-14 17:10       ` Noah Goldstein
2022-04-14 17:18         ` Adhemerval Zanella
2022-04-14 17:22           ` Noah Goldstein
2022-04-14 18:25             ` Adhemerval Zanella
2022-04-14 17:17   ` Noah Goldstein
2022-04-14 18:11     ` Adhemerval Zanella
2022-04-14 19:25   ` Noah Goldstein
2022-04-14 19:40     ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 5/7] x86: Add AVX2 " Adhemerval Zanella
2022-04-13 23:04   ` Noah Goldstein
2022-04-14 17:16     ` Adhemerval Zanella
2022-04-14 17:20       ` Noah Goldstein
2022-04-14 18:12         ` Adhemerval Zanella [this message]
2022-04-13 20:24 ` [PATCH 6/7] aarch64: Add " Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 7/7] powerpc64: " Adhemerval Zanella
2022-04-14  7:36 ` [PATCH 0/7] Add arc4random support Yann Droneaud
2022-04-14 18:39   ` Adhemerval Zanella
2022-04-14 18:43     ` Noah Goldstein
2022-04-15 10:22     ` Yann Droneaud
2022-04-14 11:49 ` Cristian Rodríguez
2022-04-14 19:26   ` Adhemerval Zanella
2022-04-14 20:36     ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3845d9c-1182-5b1c-1113-89d16c2e89db@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).