Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Noah Goldstein <goldstein.w.n@gmail.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20
Date: Thu, 14 Apr 2022 12:20:20 -0500	[thread overview]
Message-ID: <CAFUsyfLFsWN=bPabMZonSAg0TX_92btt5vfA5FPPKDc-jUxw9g@mail.gmail.com> (raw)
In-Reply-To: <aa0d6500-69ab-7e36-eaf8-2159934bf883@linaro.org>

On Thu, Apr 14, 2022 at 12:17 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 13/04/2022 20:04, Noah Goldstein wrote:
> > On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >>
> >> +       .text
> >
> > section avx2
> >
>
> Ack, I changed to '.section .text.avx2, "ax", @progbits'.
>
> >> +       .align 32
> >> +chacha20_data:
> >> +L(shuf_rol16):
> >> +       .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
> >> +L(shuf_rol8):
> >> +       .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
> >> +L(inc_counter):
> >> +       .byte 0,1,2,3,4,5,6,7
> >> +L(unsigned_cmp):
> >> +       .long 0x80000000
> >> +
> >> +ENTRY (__chacha20_avx2_blocks8)
> >> +       /* input:
> >> +        *      %rdi: input
> >> +        *      %rsi: dst
> >> +        *      %rdx: src
> >> +        *      %rcx: nblks (multiple of 8)
> >> +        */
> >> +       vzeroupper;
> >
> > vzeroupper needs to be replaced with VZEROUPPER_RETURN
> > and we need a transaction safe version unless this can never
> > be called during a transaction.
>
> I think you meant VZEROUPPER here (VZEROUPPER_RETURN seems to trigger
> test case failures). What do you mean by a 'transaction safe version'?
> Ax extra __chacha20_avx2_blocks8 implementation to handle it? Or disable
> it if RTM is enabled?

For now you can just update the cpufeature check to do ssse3 if RTM is enabled.

>
> >> +
> >> +       /* clear the used vector registers and stack */
> >> +       vpxor X0, X0, X0;
> >> +       vmovdqa X0, (STACK_VEC_X12)(%rsp);
> >> +       vmovdqa X0, (STACK_VEC_X13)(%rsp);
> >> +       vmovdqa X0, (STACK_TMP)(%rsp);
> >> +       vmovdqa X0, (STACK_TMP1)(%rsp);
> >> +       vzeroall;
> >
> > Do you need vzeroall?
> > Why not vzeroupper? Is it a security concern to leave info in the xmm pieces?
>
> I would assume, since it is on the original libgrcypt optimization.  As
> for the ssse3 version, I am not sure if we really need that level of
> hardening, but it would be good to have the initial revision as close
> as possible from libgcrypt.

Got it.

>
> >
> >
> >> +
> >> +       /* eax zeroed by round loop. */
> >> +       leave;
> >> +       cfi_adjust_cfa_offset(-8)
> >> +       cfi_def_cfa_register(%rsp);
> >> +       ret;
> >> +       int3;
> >
> > Why do we need int3 here?
>
> I think the ssse3 applies here as well.
>
> >> +END(__chacha20_avx2_blocks8)
> >> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
> >> index 37a4fdfb1f..7e9e7755f3 100644
> >> --- a/sysdeps/x86_64/chacha20_arch.h
> >> +++ b/sysdeps/x86_64/chacha20_arch.h
> >> @@ -22,11 +22,25 @@
> >>
> >>  unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst,
> >>                                        const uint8_t *src, size_t nblks);
> >> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
> >> +                                     const uint8_t *src, size_t nblks);
> >>
> >>  static inline void
> >>  chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src,
> >>                 size_t bytes)
> >>  {
> >> +  const struct cpu_features* cpu_features = __get_cpu_features ();
> >
> > Can we do this with an ifunc and take the cpufeature check off the critical
> > path?
>
> Ditto.
>
> >> +
> >> +  if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && bytes >= CHACHA20_BLOCK_SIZE * 8)
> >> +    {
> >> +      size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
> >> +      nblocks -= nblocks % 8;
> >> +      __chacha20_avx2_blocks8 (state->ctx, dst, src, nblocks);
> >> +      bytes -= nblocks * CHACHA20_BLOCK_SIZE;
> >> +      dst += nblocks * CHACHA20_BLOCK_SIZE;
> >> +      src += nblocks * CHACHA20_BLOCK_SIZE;
> >> +    }
> >> +
> >>    if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4)
> >>      {
> >>        size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
> >> --
> >> 2.32.0
> >>
> >
> > Do you want optimization comments or do that later?
>
> Ideally I would like to check if the proposed arc4random implementation
> is what we want (with current approach of using atfork handlers and the
> key reschedule).  The cipher itself it not the utmost important in the
> sense it is transparent to user and we can eventually replace it if there
> any issue or attack to ChaCha20.  Initially I won't add any arch-specific
> optimization, but since libgcrypt provides some that fits on the current
> approach I though it would be a nice thing to have.
>
> For optimization comments it would be good to sync with libgcrypt as well,
> I think the project will be interested in any performance improvement
> you might have for the chacha implementations.
Okay, I'll probably take a stab at this in the not too distant future.

next prev parent reply	other threads:[~2022-04-14 17:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 20:23 [PATCH 0/7] Add arc4random support Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 2/7] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-14 18:01   ` Noah Goldstein
2022-04-13 20:23 ` [PATCH 3/7] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-14 19:17   ` Noah Goldstein
2022-04-14 19:48     ` Adhemerval Zanella
2022-04-14 20:33       ` Noah Goldstein
2022-04-14 20:48         ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Adhemerval Zanella
2022-04-13 23:12   ` Noah Goldstein
2022-04-14 17:03     ` Adhemerval Zanella
2022-04-14 17:10       ` Noah Goldstein
2022-04-14 17:18         ` Adhemerval Zanella
2022-04-14 17:22           ` Noah Goldstein
2022-04-14 18:25             ` Adhemerval Zanella
2022-04-14 17:17   ` Noah Goldstein
2022-04-14 18:11     ` Adhemerval Zanella
2022-04-14 19:25   ` Noah Goldstein
2022-04-14 19:40     ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 5/7] x86: Add AVX2 " Adhemerval Zanella
2022-04-13 23:04   ` Noah Goldstein
2022-04-14 17:16     ` Adhemerval Zanella
2022-04-14 17:20       ` Noah Goldstein [this message]
2022-04-14 18:12         ` Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 6/7] aarch64: Add " Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 7/7] powerpc64: " Adhemerval Zanella
2022-04-14  7:36 ` [PATCH 0/7] Add arc4random support Yann Droneaud
2022-04-14 18:39   ` Adhemerval Zanella
2022-04-14 18:43     ` Noah Goldstein
2022-04-15 10:22     ` Yann Droneaud
2022-04-14 11:49 ` Cristian Rodríguez
2022-04-14 19:26   ` Adhemerval Zanella
2022-04-14 20:36     ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFUsyfLFsWN=bPabMZonSAg0TX_92btt5vfA5FPPKDc-jUxw9g@mail.gmail.com' \
    --to=goldstein.w.n@gmail.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).