Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20
Date: Thu, 14 Apr 2022 14:16:55 -0300	[thread overview]
Message-ID: <aa0d6500-69ab-7e36-eaf8-2159934bf883@linaro.org> (raw)
In-Reply-To: <CAFUsyfLDQu8WHfhRO8bth3U5MP30xxBeC-P_5TmpBbcfRU-jaA@mail.gmail.com>



On 13/04/2022 20:04, Noah Goldstein wrote:
> On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> +       .text
> 
> section avx2
> 

Ack, I changed to '.section .text.avx2, "ax", @progbits'.

>> +       .align 32
>> +chacha20_data:
>> +L(shuf_rol16):
>> +       .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
>> +L(shuf_rol8):
>> +       .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
>> +L(inc_counter):
>> +       .byte 0,1,2,3,4,5,6,7
>> +L(unsigned_cmp):
>> +       .long 0x80000000
>> +
>> +ENTRY (__chacha20_avx2_blocks8)
>> +       /* input:
>> +        *      %rdi: input
>> +        *      %rsi: dst
>> +        *      %rdx: src
>> +        *      %rcx: nblks (multiple of 8)
>> +        */
>> +       vzeroupper;
> 
> vzeroupper needs to be replaced with VZEROUPPER_RETURN
> and we need a transaction safe version unless this can never
> be called during a transaction.

I think you meant VZEROUPPER here (VZEROUPPER_RETURN seems to trigger
test case failures). What do you mean by a 'transaction safe version'?
Ax extra __chacha20_avx2_blocks8 implementation to handle it? Or disable
it if RTM is enabled?

>> +
>> +       /* clear the used vector registers and stack */
>> +       vpxor X0, X0, X0;
>> +       vmovdqa X0, (STACK_VEC_X12)(%rsp);
>> +       vmovdqa X0, (STACK_VEC_X13)(%rsp);
>> +       vmovdqa X0, (STACK_TMP)(%rsp);
>> +       vmovdqa X0, (STACK_TMP1)(%rsp);
>> +       vzeroall;
> 
> Do you need vzeroall?
> Why not vzeroupper? Is it a security concern to leave info in the xmm pieces?

I would assume, since it is on the original libgrcypt optimization.  As
for the ssse3 version, I am not sure if we really need that level of
hardening, but it would be good to have the initial revision as close
as possible from libgcrypt.

> 
> 
>> +
>> +       /* eax zeroed by round loop. */
>> +       leave;
>> +       cfi_adjust_cfa_offset(-8)
>> +       cfi_def_cfa_register(%rsp);
>> +       ret;
>> +       int3;
> 
> Why do we need int3 here?

I think the ssse3 applies here as well.

>> +END(__chacha20_avx2_blocks8)
>> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
>> index 37a4fdfb1f..7e9e7755f3 100644
>> --- a/sysdeps/x86_64/chacha20_arch.h
>> +++ b/sysdeps/x86_64/chacha20_arch.h
>> @@ -22,11 +22,25 @@
>>
>>  unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst,
>>                                        const uint8_t *src, size_t nblks);
>> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
>> +                                     const uint8_t *src, size_t nblks);
>>
>>  static inline void
>>  chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src,
>>                 size_t bytes)
>>  {
>> +  const struct cpu_features* cpu_features = __get_cpu_features ();
> 
> Can we do this with an ifunc and take the cpufeature check off the critical
> path?

Ditto.

>> +
>> +  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && bytes >= CHACHA20_BLOCK_SIZE * 8)
>> +    {
>> +      size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
>> +      nblocks -= nblocks % 8;
>> +      __chacha20_avx2_blocks8 (state->ctx, dst, src, nblocks);
>> +      bytes -= nblocks * CHACHA20_BLOCK_SIZE;
>> +      dst += nblocks * CHACHA20_BLOCK_SIZE;
>> +      src += nblocks * CHACHA20_BLOCK_SIZE;
>> +    }
>> +
>>    if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4)
>>      {
>>        size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
>> --
>> 2.32.0
>>
> 
> Do you want optimization comments or do that later?

Ideally I would like to check if the proposed arc4random implementation 
is what we want (with current approach of using atfork handlers and the
key reschedule).  The cipher itself it not the utmost important in the 
sense it is transparent to user and we can eventually replace it if there
any issue or attack to ChaCha20.  Initially I won't add any arch-specific
optimization, but since libgcrypt provides some that fits on the current
approach I though it would be a nice thing to have.

For optimization comments it would be good to sync with libgcrypt as well,
I think the project will be interested in any performance improvement
you might have for the chacha implementations.

next prev parent reply	other threads:[~2022-04-14 17:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 20:23 [PATCH 0/7] Add arc4random support Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 2/7] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-14 18:01   ` Noah Goldstein
2022-04-13 20:23 ` [PATCH 3/7] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-14 19:17   ` Noah Goldstein
2022-04-14 19:48     ` Adhemerval Zanella
2022-04-14 20:33       ` Noah Goldstein
2022-04-14 20:48         ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Adhemerval Zanella
2022-04-13 23:12   ` Noah Goldstein
2022-04-14 17:03     ` Adhemerval Zanella
2022-04-14 17:10       ` Noah Goldstein
2022-04-14 17:18         ` Adhemerval Zanella
2022-04-14 17:22           ` Noah Goldstein
2022-04-14 18:25             ` Adhemerval Zanella
2022-04-14 17:17   ` Noah Goldstein
2022-04-14 18:11     ` Adhemerval Zanella
2022-04-14 19:25   ` Noah Goldstein
2022-04-14 19:40     ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 5/7] x86: Add AVX2 " Adhemerval Zanella
2022-04-13 23:04   ` Noah Goldstein
2022-04-14 17:16     ` Adhemerval Zanella [this message]
2022-04-14 17:20       ` Noah Goldstein
2022-04-14 18:12         ` Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 6/7] aarch64: Add " Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 7/7] powerpc64: " Adhemerval Zanella
2022-04-14  7:36 ` [PATCH 0/7] Add arc4random support Yann Droneaud
2022-04-14 18:39   ` Adhemerval Zanella
2022-04-14 18:43     ` Noah Goldstein
2022-04-15 10:22     ` Yann Droneaud
2022-04-14 11:49 ` Cristian Rodríguez
2022-04-14 19:26   ` Adhemerval Zanella
2022-04-14 20:36     ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa0d6500-69ab-7e36-eaf8-2159934bf883@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).