From: Noah Goldstein <goldstein.w.n@gmail.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 5/7] x86: Add AVX2 optimized chacha20
Date: Thu, 14 Apr 2022 12:20:20 -0500 [thread overview]
Message-ID: <CAFUsyfLFsWN=bPabMZonSAg0TX_92btt5vfA5FPPKDc-jUxw9g@mail.gmail.com> (raw)
In-Reply-To: <aa0d6500-69ab-7e36-eaf8-2159934bf883@linaro.org>
On Thu, Apr 14, 2022 at 12:17 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 13/04/2022 20:04, Noah Goldstein wrote:
> > On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >>
> >> + .text
> >
> > section avx2
> >
>
> Ack, I changed to '.section .text.avx2, "ax", @progbits'.
>
> >> + .align 32
> >> +chacha20_data:
> >> +L(shuf_rol16):
> >> + .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13
> >> +L(shuf_rol8):
> >> + .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14
> >> +L(inc_counter):
> >> + .byte 0,1,2,3,4,5,6,7
> >> +L(unsigned_cmp):
> >> + .long 0x80000000
> >> +
> >> +ENTRY (__chacha20_avx2_blocks8)
> >> + /* input:
> >> + * %rdi: input
> >> + * %rsi: dst
> >> + * %rdx: src
> >> + * %rcx: nblks (multiple of 8)
> >> + */
> >> + vzeroupper;
> >
> > vzeroupper needs to be replaced with VZEROUPPER_RETURN
> > and we need a transaction safe version unless this can never
> > be called during a transaction.
>
> I think you meant VZEROUPPER here (VZEROUPPER_RETURN seems to trigger
> test case failures). What do you mean by a 'transaction safe version'?
> Ax extra __chacha20_avx2_blocks8 implementation to handle it? Or disable
> it if RTM is enabled?
For now you can just update the cpufeature check to do ssse3 if RTM is enabled.
>
> >> +
> >> + /* clear the used vector registers and stack */
> >> + vpxor X0, X0, X0;
> >> + vmovdqa X0, (STACK_VEC_X12)(%rsp);
> >> + vmovdqa X0, (STACK_VEC_X13)(%rsp);
> >> + vmovdqa X0, (STACK_TMP)(%rsp);
> >> + vmovdqa X0, (STACK_TMP1)(%rsp);
> >> + vzeroall;
> >
> > Do you need vzeroall?
> > Why not vzeroupper? Is it a security concern to leave info in the xmm pieces?
>
> I would assume, since it is on the original libgrcypt optimization. As
> for the ssse3 version, I am not sure if we really need that level of
> hardening, but it would be good to have the initial revision as close
> as possible from libgcrypt.
Got it.
>
> >
> >
> >> +
> >> + /* eax zeroed by round loop. */
> >> + leave;
> >> + cfi_adjust_cfa_offset(-8)
> >> + cfi_def_cfa_register(%rsp);
> >> + ret;
> >> + int3;
> >
> > Why do we need int3 here?
>
> I think the ssse3 applies here as well.
>
> >> +END(__chacha20_avx2_blocks8)
> >> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
> >> index 37a4fdfb1f..7e9e7755f3 100644
> >> --- a/sysdeps/x86_64/chacha20_arch.h
> >> +++ b/sysdeps/x86_64/chacha20_arch.h
> >> @@ -22,11 +22,25 @@
> >>
> >> unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst,
> >> const uint8_t *src, size_t nblks);
> >> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst,
> >> + const uint8_t *src, size_t nblks);
> >>
> >> static inline void
> >> chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src,
> >> size_t bytes)
> >> {
> >> + const struct cpu_features* cpu_features = __get_cpu_features ();
> >
> > Can we do this with an ifunc and take the cpufeature check off the critical
> > path?
>
> Ditto.
>
> >> +
> >> + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && bytes >= CHACHA20_BLOCK_SIZE * 8)
> >> + {
> >> + size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
> >> + nblocks -= nblocks % 8;
> >> + __chacha20_avx2_blocks8 (state->ctx, dst, src, nblocks);
> >> + bytes -= nblocks * CHACHA20_BLOCK_SIZE;
> >> + dst += nblocks * CHACHA20_BLOCK_SIZE;
> >> + src += nblocks * CHACHA20_BLOCK_SIZE;
> >> + }
> >> +
> >> if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4)
> >> {
> >> size_t nblocks = bytes / CHACHA20_BLOCK_SIZE;
> >> --
> >> 2.32.0
> >>
> >
> > Do you want optimization comments or do that later?
>
> Ideally I would like to check if the proposed arc4random implementation
> is what we want (with current approach of using atfork handlers and the
> key reschedule). The cipher itself it not the utmost important in the
> sense it is transparent to user and we can eventually replace it if there
> any issue or attack to ChaCha20. Initially I won't add any arch-specific
> optimization, but since libgcrypt provides some that fits on the current
> approach I though it would be a nice thing to have.
>
> For optimization comments it would be good to sync with libgcrypt as well,
> I think the project will be interested in any performance improvement
> you might have for the chacha implementations.
Okay, I'll probably take a stab at this in the not too distant future.
next prev parent reply other threads:[~2022-04-14 17:20 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 20:23 [PATCH 0/7] Add arc4random support Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 2/7] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-14 18:01 ` Noah Goldstein
2022-04-13 20:23 ` [PATCH 3/7] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-14 19:17 ` Noah Goldstein
2022-04-14 19:48 ` Adhemerval Zanella
2022-04-14 20:33 ` Noah Goldstein
2022-04-14 20:48 ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Adhemerval Zanella
2022-04-13 23:12 ` Noah Goldstein
2022-04-14 17:03 ` Adhemerval Zanella
2022-04-14 17:10 ` Noah Goldstein
2022-04-14 17:18 ` Adhemerval Zanella
2022-04-14 17:22 ` Noah Goldstein
2022-04-14 18:25 ` Adhemerval Zanella
2022-04-14 17:17 ` Noah Goldstein
2022-04-14 18:11 ` Adhemerval Zanella
2022-04-14 19:25 ` Noah Goldstein
2022-04-14 19:40 ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 5/7] x86: Add AVX2 " Adhemerval Zanella
2022-04-13 23:04 ` Noah Goldstein
2022-04-14 17:16 ` Adhemerval Zanella
2022-04-14 17:20 ` Noah Goldstein [this message]
2022-04-14 18:12 ` Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 6/7] aarch64: Add " Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 7/7] powerpc64: " Adhemerval Zanella
2022-04-14 7:36 ` [PATCH 0/7] Add arc4random support Yann Droneaud
2022-04-14 18:39 ` Adhemerval Zanella
2022-04-14 18:43 ` Noah Goldstein
2022-04-15 10:22 ` Yann Droneaud
2022-04-14 11:49 ` Cristian Rodríguez
2022-04-14 19:26 ` Adhemerval Zanella
2022-04-14 20:36 ` Noah Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFUsyfLFsWN=bPabMZonSAg0TX_92btt5vfA5FPPKDc-jUxw9g@mail.gmail.com' \
--to=goldstein.w.n@gmail.com \
--cc=adhemerval.zanella@linaro.org \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).