From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id EA38F3857347 for ; Wed, 13 Jul 2022 19:32:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA38F3857347 Received: by mail-ot1-x334.google.com with SMTP id e1-20020a05683013c100b0061c1a6b8d11so9064378otq.8 for ; Wed, 13 Jul 2022 12:32:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=NKX9Q8rkmTxdB2ajkhJ8RE2yL1pjN77m8YWOU3y/0NU=; b=fsi2cBdiF5zqYVGTo3fV7TQRJSNaPXgquPf8HaFEa41TpM3voyh1cBfjvc9yNj0ucd 2hJyXaebjsKAslQls+bWGUvWqjORgdWKDpYoGycpNjXMqcz7vYhF7GIrTIrbthGt4gyw iB3NNyjxdHbamspAfIOOFLlUs892DXF4vF+CDvxAK1dl+9rxU/yTxeNS9JJQNzrgsC+g rSzkP2kagBG3bYD8Ap8Py5EzWH9aRhaLGAEOz4812Qhpyhn/FcIwn+vxLFP8BXHE29DO 0xyXHnq+96SdXov6/NHaKVLMKmVnos6XhbC5uPxb6tx2TTD6Ap5Z3fIrV97SUCCAWN7B qRzw== X-Gm-Message-State: AJIora8XaSq4TAtzi6+T9XX5wyX+bTwQAuZL5GUzFuIAX5xBUmK4kpfL gNZCpDjIg4taPlJteZ19hUxPNw== X-Google-Smtp-Source: AGRyM1tm+5o9naU4hgBrdLn+TMsr2DyCFtXCbWYCDd7m3Ar0JvoSVjXsgxDaZ8rJUhRoJIMRLDqZsw== X-Received: by 2002:a9d:7098:0:b0:61c:3c53:2e8a with SMTP id l24-20020a9d7098000000b0061c3c532e8amr2030294otj.378.1657740721268; Wed, 13 Jul 2022 12:32:01 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:19c3:4427:c171:4fa9:c3d9? ([2804:431:c7ca:19c3:4427:c171:4fa9:c3d9]) by smtp.gmail.com with ESMTPSA id y6-20020a056870458600b001089aef1815sm6424571oao.20.2022.07.13.12.31.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Jul 2022 12:32:00 -0700 (PDT) Message-ID: Date: Wed, 13 Jul 2022 16:31:57 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.0.2 Subject: Re: [PATCH v9 6/9] x86: Add AVX2 optimized chacha20 Content-Language: en-US To: Noah Goldstein Cc: GNU C Library , Florian Weimer References: <20220713173657.516725-1-adhemerval.zanella@linaro.org> <20220713173657.516725-7-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2022 19:32:04 -0000 On 13/07/22 15:07, Noah Goldstein wrote: > On Wed, Jul 13, 2022 at 10:40 AM Adhemerval Zanella via Libc-alpha > wrote: >> >> From: Adhemerval Zanella Netto >> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h >> index 5738c840a9..bfdc6c0a36 100644 >> --- a/sysdeps/x86_64/chacha20_arch.h >> +++ b/sysdeps/x86_64/chacha20_arch.h >> @@ -23,16 +23,26 @@ >> unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, >> const uint8_t *src, size_t nblks) >> attribute_hidden; >> +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, >> + const uint8_t *src, size_t nblks) >> + attribute_hidden; >> >> static inline void >> chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, >> size_t bytes) >> { >> - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, >> - "CHACHA20_BUFSIZE not multiple of 4"); >> - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, >> - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); >> + _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, >> + "CHACHA20_BUFSIZE not multiple of 4 or 8"); >> + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, >> + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); >> + const struct cpu_features* cpu_features = __get_cpu_features (); >> >> - __chacha20_sse2_blocks4 (state, dst, src, >> - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >> + /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ > > Since `arc4random ()` might need to read from /dev/urandom I don't > think this function could ever truly be RTM safe so we may not care.> > If im missing something we do want to support RTM, should there be a > '!CPU_FEATURE_USABLE_P (cpu_features, RTM)' check for the avx2 > implementation? > I don't fully recall the issue regarding RTM to be sincere (just that we had to rework some ifunc selection to handle it). > > >> + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) >> + && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > > Can you use the X86_ISA_* macro? > > In this case the code would be: > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) Yes it would work. I have changed to the following to support the x86_64 isa work you have been doing: -- #if MINIMUM_X86_ISA_LEVEL > 2 __chacha20_avx2_blocks8 (state, dst, src, CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); #else const struct cpu_features* cpu_features = __get_cpu_features (); /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) __chacha20_avx2_blocks8 (state, dst, src, CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); else __chacha20_sse2_blocks4 (state, dst, src, CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); #endif -- I am aware that X86_ISA_CPU_FEATURE_USABLE_P will const-eval to 1 if the ISA is higher enough, but I think the code is slight clear (specially when the reader is not aware of the const-eval). > > >> + __chacha20_avx2_blocks8 (state, dst, src, >> + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >> + else >> + __chacha20_sse2_blocks4 (state, dst, src, >> + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >> } >> -- >> 2.34.1 >>