From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) by sourceware.org (Postfix) with ESMTPS id 1F2AE3858C53 for ; Thu, 14 Apr 2022 17:10:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F2AE3858C53 Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-2db2add4516so62452267b3.1 for ; Thu, 14 Apr 2022 10:10:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PgGc6nfUg7oSH9Z8P1YwzvfR4gffXcXxjP+7zc1ncyY=; b=E8mpfloRAOPywypEs9NQBtyoG6jVrO6eOQkfSx61m20VN8kxQdSLcwLj/D4sauHaIi pqfsg6n6VFLl470I9Qsu8Nl2A7upHl22Q5GeWMDwO8PBUfVzGoXQzb7PqDauhjfmv2Ao R7Zqczj0U23b2kypR1gxxWRbWUdpLAwn1pA5dD3se/fxOH0+dTe7wJZB41MS3Lu1vynf tbLHgpO+pklxkjFm68yRvxnstbo7qTkrQMCsgjvr/XjVeZP9mAQ4+zkVSDF6L1a5w8kJ SGq6S/J933dPw/l4HcIJrHuYjaKY2izSvzOz/WXShmjgivReU6LFYOh2dvsC7RU2mEv6 bwsg== X-Gm-Message-State: AOAM530k/8kshxT9O25NDSMfSJfw36CDjjoxE++vcXAY3HBW7b6j1W3c 2/bMZD7U++qiqR/QdTeBK/G7CtaUfrIt8ohNZWk= X-Google-Smtp-Source: ABdhPJz0pY8m5fuPDlh9zzYsw4EsT9OcsnhTB8oeTFNdsYFaMxiJTttIjVWhITeroF874DbHkAWvTE+hCSxS+ZcJ17E= X-Received: by 2002:a05:690c:d:b0:2d0:e02a:6cda with SMTP id bc13-20020a05690c000d00b002d0e02a6cdamr2950544ywb.192.1649956222503; Thu, 14 Apr 2022 10:10:22 -0700 (PDT) MIME-Version: 1.0 References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> <20220413202401.408267-5-adhemerval.zanella@linaro.org> In-Reply-To: From: Noah Goldstein Date: Thu, 14 Apr 2022 12:10:12 -0500 Message-ID: Subject: Re: [PATCH 4/7] x86: Add SSSE3 optimized chacha20 To: Adhemerval Zanella Cc: GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2022 17:10:24 -0000 On Thu, Apr 14, 2022 at 12:03 PM Adhemerval Zanella wrote: > > > > On 13/04/2022 20:12, Noah Goldstein wrote: > > On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha > > wrote: > >> > >> + > >> + /* eax zeroed by round loop. */ > >> + leave; > >> + cfi_adjust_cfa_offset(-8) > >> + cfi_def_cfa_register(%rsp); > >> + ret; > >> + int3; > > why int3? > > It was originally added on libgcrypt by 11ade08efbfbc36dbf3571f1026946269950bc40, > as a straight-line speculation hardening. It is was is emitted by clang 14 and > gcc 12 with -mharden-sls=return. > > I am not sure if we need that kind of hardening, but I would prefer to the first > version be in sync with libgcrypt as much as possible so the future optimizations > would be simpler to keep localized to glibc (if libgcrypt does not want to > backport it). Okay, can keep for now. Any thoughts on changing it to sse2? > > >> +END (__chacha20_ssse3_blocks8) > >> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h > >> new file mode 100644 > >> index 0000000000..37a4fdfb1f > >> --- /dev/null > >> +++ b/sysdeps/x86_64/chacha20_arch.h > >> @@ -0,0 +1,42 @@ > >> +/* Chacha20 implementation, used on arc4random. > >> + Copyright (C) 2022 Free Software Foundation, Inc. > >> + This file is part of the GNU C Library. > >> + > >> + The GNU C Library is free software; you can redistribute it and/or > >> + modify it under the terms of the GNU Lesser General Public > >> + License as published by the Free Software Foundation; either > >> + version 2.1 of the License, or (at your option) any later version. > >> + > >> + The GNU C Library is distributed in the hope that it will be useful, > >> + but WITHOUT ANY WARRANTY; without even the implied warranty of > >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > >> + Lesser General Public License for more details. > >> + > >> + You should have received a copy of the GNU Lesser General Public > >> + License along with the GNU C Library; if not, see > >> + . */ > >> + > >> +#include > >> +#include > >> +#include > >> + > >> +unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst, > >> + const uint8_t *src, size_t nblks); > >> + > >> +static inline void > >> +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src, > >> + size_t bytes) > >> +{ > >> + if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4) > > > > Can we make this an ifunc? > > I though about it, but if you check on arc4random implementation the > chacha20_crypt is called for the whole internal buf once it is exhausted. > Assuming a 1 cycle per byte (as indicated by bench-slope libgrcypt on > my machine), it will be at least 1k cycles to encrypt each block. I > am not sure if setting up an internal PLT call to save a couple of cycles > on a internal function will really show anything significant here (assuming > that the PLT call won't add more overhead in fact). > > Besides that the code boilerplate to setup the internal ifunc is also > way more complex. Okay for now as long as open to changing later (not that we will but that this isn't locking us into the decision).