public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Noah Goldstein <goldstein.w.n@gmail.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH 4/7] x86: Add SSSE3 optimized chacha20
Date: Thu, 14 Apr 2022 12:22:41 -0500	[thread overview]
Message-ID: <CAFUsyfL9OgXzJhVrNrE=8PZr+++mNzn4BYsCFK8hw+e9nftk5Q@mail.gmail.com> (raw)
In-Reply-To: <f5d4687d-5549-74c5-42be-cab2410d69b1@linaro.org>

On Thu, Apr 14, 2022 at 12:19 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 14/04/2022 14:10, Noah Goldstein wrote:
> > On Thu, Apr 14, 2022 at 12:03 PM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 13/04/2022 20:12, Noah Goldstein wrote:
> >>> On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha
> >>> <libc-alpha@sourceware.org> wrote:
> >>>>
> >>>> +
> >>>> +       /* eax zeroed by round loop. */
> >>>> +       leave;
> >>>> +       cfi_adjust_cfa_offset(-8)
> >>>> +       cfi_def_cfa_register(%rsp);
> >>>> +       ret;
> >>>> +       int3;
> >>> why int3?
> >>
> >> It was originally added on libgcrypt by 11ade08efbfbc36dbf3571f1026946269950bc40,
> >> as a straight-line speculation hardening.  It is was is emitted by clang 14 and
> >> gcc 12 with -mharden-sls=return.
> >>
> >> I am not sure if we need that kind of hardening, but I would prefer to the first
> >> version be in sync with libgcrypt as much as possible so the future optimizations
> >> would be simpler to keep localized to glibc (if libgcrypt does not want to
> >> backport it).
> >
> > Okay, can keep for now. Any thoughts on changing it to sse2?
> >
>
> No strong feeling, I used the ssse3 one because it is readily available from
> libgcrypt.

I think the only ssse3 is `pshufb` so you can just replace the optimized
rotates with the shift rotates and that will make it sse2 (unless I'm missing
an instruction).

Also can you add the proper .text section here as well (or .sse2 or .ssse3)

>
> >
> >>
> >>>> +END (__chacha20_ssse3_blocks8)
> >>>> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h
> >>>> new file mode 100644
> >>>> index 0000000000..37a4fdfb1f
> >>>> --- /dev/null
> >>>> +++ b/sysdeps/x86_64/chacha20_arch.h
> >>>> @@ -0,0 +1,42 @@
> >>>> +/* Chacha20 implementation, used on arc4random.
> >>>> +   Copyright (C) 2022 Free Software Foundation, Inc.
> >>>> +   This file is part of the GNU C Library.
> >>>> +
> >>>> +   The GNU C Library is free software; you can redistribute it and/or
> >>>> +   modify it under the terms of the GNU Lesser General Public
> >>>> +   License as published by the Free Software Foundation; either
> >>>> +   version 2.1 of the License, or (at your option) any later version.
> >>>> +
> >>>> +   The GNU C Library is distributed in the hope that it will be useful,
> >>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >>>> +   Lesser General Public License for more details.
> >>>> +
> >>>> +   You should have received a copy of the GNU Lesser General Public
> >>>> +   License along with the GNU C Library; if not, see
> >>>> +   <http://www.gnu.org/licenses/>.  */
> >>>> +
> >>>> +#include <ldsodefs.h>
> >>>> +#include <cpu-features.h>
> >>>> +#include <sys/param.h>
> >>>> +
> >>>> +unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst,
> >>>> +                                      const uint8_t *src, size_t nblks);
> >>>> +
> >>>> +static inline void
> >>>> +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src,
> >>>> +               size_t bytes)
> >>>> +{
> >>>> +  if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4)
> >>>
> >>> Can we make this an ifunc?
> >>
> >> I though about it, but if you check on arc4random implementation the
> >> chacha20_crypt is called for the whole internal buf once it is exhausted.
> >> Assuming a 1 cycle per byte (as indicated by bench-slope libgrcypt on
> >> my machine), it will be at least 1k cycles to encrypt each block.  I
> >> am not sure if setting up an internal PLT call to save a couple of cycles
> >> on a internal function will really show anything significant here (assuming
> >> that the PLT call won't add more overhead in fact).
> >>
> >> Besides that the code boilerplate to setup the internal ifunc is also
> >> way more complex.
> >
> > Okay for now as long as open to changing later (not that we will but that
> > this isn't locking us into the decision).
>
> For sure, if iFUNC does help on this case the change should be simple for
> the generic code.  The boilerplate is for the x86_64 bits in facts (to
> setup the iFUNC resolver, Makefile, etc.).

  reply	other threads:[~2022-04-14 17:22 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 20:23 [PATCH 0/7] Add arc4random support Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 2/7] stdlib: Add arc4random tests Adhemerval Zanella
2022-04-14 18:01   ` Noah Goldstein
2022-04-13 20:23 ` [PATCH 3/7] benchtests: Add arc4random benchtest Adhemerval Zanella
2022-04-14 19:17   ` Noah Goldstein
2022-04-14 19:48     ` Adhemerval Zanella
2022-04-14 20:33       ` Noah Goldstein
2022-04-14 20:48         ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Adhemerval Zanella
2022-04-13 23:12   ` Noah Goldstein
2022-04-14 17:03     ` Adhemerval Zanella
2022-04-14 17:10       ` Noah Goldstein
2022-04-14 17:18         ` Adhemerval Zanella
2022-04-14 17:22           ` Noah Goldstein [this message]
2022-04-14 18:25             ` Adhemerval Zanella
2022-04-14 17:17   ` Noah Goldstein
2022-04-14 18:11     ` Adhemerval Zanella
2022-04-14 19:25   ` Noah Goldstein
2022-04-14 19:40     ` Adhemerval Zanella
2022-04-13 20:23 ` [PATCH 5/7] x86: Add AVX2 " Adhemerval Zanella
2022-04-13 23:04   ` Noah Goldstein
2022-04-14 17:16     ` Adhemerval Zanella
2022-04-14 17:20       ` Noah Goldstein
2022-04-14 18:12         ` Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 6/7] aarch64: Add " Adhemerval Zanella
2022-04-13 20:24 ` [PATCH 7/7] powerpc64: " Adhemerval Zanella
2022-04-14  7:36 ` [PATCH 0/7] Add arc4random support Yann Droneaud
2022-04-14 18:39   ` Adhemerval Zanella
2022-04-14 18:43     ` Noah Goldstein
2022-04-15 10:22     ` Yann Droneaud
2022-04-14 11:49 ` Cristian Rodríguez
2022-04-14 19:26   ` Adhemerval Zanella
2022-04-14 20:36     ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFUsyfL9OgXzJhVrNrE=8PZr+++mNzn4BYsCFK8hw+e9nftk5Q@mail.gmail.com' \
    --to=goldstein.w.n@gmail.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).