From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by sourceware.org (Postfix) with ESMTPS id 6FDDF3858419 for ; Wed, 13 Jul 2022 20:24:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6FDDF3858419 Received: by mail-ot1-x332.google.com with SMTP id n12-20020a9d64cc000000b00616ebd87fc4so9163251otl.7 for ; Wed, 13 Jul 2022 13:24:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BHZOZOiur8o4XegKSicVqxJTcgnPyrTVSnyFe72XJ1I=; b=l/SinkpcNlcGVrqQ6zr9evd3PuVmTwSU/maUul2yCk4ai7Wjysd6OfbXGOu0EylZFP GVgkckL2kafr+Lck+qGR1sFNUUlDiCddrUXM3DpbFnk5AP/jvTfYbhsF1vVaAkUOywIr 9/cPrqUZ7tL96QPYcc1comYGnbJmw6CLRDv3LC5Fphf4X0fFWLKdPcgIaEkLl7QYeGTs 55Fm1IDrJyCBSFeLtQXHNGkaMkfVe+zDbMWEYcgoBn3NMCnM0BLqYvMA6nkvYpYjZ/d4 LU2YH69SePLZ4d3WTPVFvmkroMtYeViRQ2xSmja3ckeU1+soEQhqfWuYIg4tCyeQrt1P 66lw== X-Gm-Message-State: AJIora8PFHlRtmAGq34KoJgs4xKmsww2tdZxyWbahHfDiFnH8l6bikSv 9iMDzbKGeQBOKogSrZ7mSKNUKj1zS4Tv3MwQ37g= X-Google-Smtp-Source: AGRyM1tFuhQ6yyxKsTMT69XXS8FnhNccImOIGoAUDSm0xPA01VLZvzlFvRtte4zVtYs/Q5nejaPp/ucrwIm57EXt5do= X-Received: by 2002:a05:6830:4420:b0:616:e569:8ae9 with SMTP id q32-20020a056830442000b00616e5698ae9mr1969415otv.265.1657743871784; Wed, 13 Jul 2022 13:24:31 -0700 (PDT) MIME-Version: 1.0 References: <20220713173657.516725-1-adhemerval.zanella@linaro.org> <20220713173657.516725-7-adhemerval.zanella@linaro.org> In-Reply-To: From: Noah Goldstein Date: Wed, 13 Jul 2022 13:24:19 -0700 Message-ID: Subject: Re: [PATCH v9 6/9] x86: Add AVX2 optimized chacha20 To: Adhemerval Zanella Netto Cc: GNU C Library , Florian Weimer Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2022 20:24:34 -0000 On Wed, Jul 13, 2022 at 12:32 PM Adhemerval Zanella Netto wrote: > > > > On 13/07/22 15:07, Noah Goldstein wrote: > > On Wed, Jul 13, 2022 at 10:40 AM Adhemerval Zanella via Libc-alpha > > wrote: > >> > >> From: Adhemerval Zanella Netto > >> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h > >> index 5738c840a9..bfdc6c0a36 100644 > >> --- a/sysdeps/x86_64/chacha20_arch.h > >> +++ b/sysdeps/x86_64/chacha20_arch.h > >> @@ -23,16 +23,26 @@ > >> unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, > >> const uint8_t *src, size_t nblks) > >> attribute_hidden; > >> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, > >> + const uint8_t *src, size_t nblks) > >> + attribute_hidden; > >> > >> static inline void > >> chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, > >> size_t bytes) > >> { > >> - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > >> - "CHACHA20_BUFSIZE not multiple of 4"); > >> - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > >> - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); > >> + _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, > >> + "CHACHA20_BUFSIZE not multiple of 4 or 8"); > >> + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, > >> + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); > >> + const struct cpu_features* cpu_features = __get_cpu_features (); > >> > >> - __chacha20_sse2_blocks4 (state, dst, src, > >> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > >> + /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ > > > > Since `arc4random ()` might need to read from /dev/urandom I don't > > think this function could ever truly be RTM safe so we may not care.> > > If im missing something we do want to support RTM, should there be a > > '!CPU_FEATURE_USABLE_P (cpu_features, RTM)' check for the avx2 > > implementation? > > > I don't fully recall the issue regarding RTM to be sincere (just that > we had to rework some ifunc selection to handle it). In this case we don't need to support RTM so no need. > > > > > > >> + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) > >> + && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > > > > Can you use the X86_ISA_* macro? > > > > In this case the code would be: > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) > > Yes it would work. I have changed to the following to support the x86_64 isa > work you have been doing: Thanks. > > -- > #if MINIMUM_X86_ISA_LEVEL > 2 > __chacha20_avx2_blocks8 (state, dst, src, > CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > #else > const struct cpu_features* cpu_features = __get_cpu_features (); > > /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) > __chacha20_avx2_blocks8 (state, dst, src, > CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > else > __chacha20_sse2_blocks4 (state, dst, src, > CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > #endif > -- > > I am aware that X86_ISA_CPU_FEATURE_USABLE_P will const-eval to 1 if the > ISA is higher enough, but I think the code is slight clear (specially > when the reader is not aware of the const-eval). > Think this is good. Thanks and sorry for the last minute requests. > > > > > > >> + __chacha20_avx2_blocks8 (state, dst, src, > >> + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > >> + else > >> + __chacha20_sse2_blocks4 (state, dst, src, > >> + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > >> } > >> -- > >> 2.34.1 > >>