From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id 86AE63858C53 for ; Thu, 14 Apr 2022 17:19:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 86AE63858C53 Received: by mail-ot1-x333.google.com with SMTP id o20-20020a9d7194000000b005cb20cf4f1bso3829746otj.7 for ; Thu, 14 Apr 2022 10:19:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=Sh4jEv9pimYO81IU/aXBmyCyBl/evEFcZ1XehqsxZJ8=; b=YJj4/2T0uCpid46FyH5eno6QntTiTcmohPNuluwyTaQLms0P0ERmPAxYyZ26yi6ere UHsPz+WNfXYggBVCG/V+A7pKryC8iTt4NOW0HDjJxMOg2DDpEfHO/YOa7DskFktlOkIG vkst+NAVqyx+UyTcXru9WvxDxQ1bmEbUanK6z77D6HHDp+9m08mxBDuZVJm8bAodiaqs wV0g51hTl7/jmKhLgccs8sQvsA5gDnLMX0Zjh//qKopKX4ISqxHGMWGN7MwPFZvI6JaA pQfZOs9pIagVf7qtvQeehFN9E6kFgzHJcbh005lUB0hQWKxrROu8vLhkLGuh8+19bcXT qSgQ== X-Gm-Message-State: AOAM531TDTipRF2+RpCqMsT6VTBWbusXWugfBs6jY102wPZitD6avPm+ mrUkEe6spV8pL0SzxJZ6iX6gug== X-Google-Smtp-Source: ABdhPJzR7Od/4uHyQ6ZsMdCU+3N30XEYhMToNVTXWsWEbjE+rzHo+nCQ8TR7YwDKg2nkVmBVBA5pTA== X-Received: by 2002:a05:6830:1ce:b0:601:73aa:3fbb with SMTP id r14-20020a05683001ce00b0060173aa3fbbmr880565ota.158.1649956745810; Thu, 14 Apr 2022 10:19:05 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:431f:3dc9:7133:8dac:5273? ([2804:431:c7ca:431f:3dc9:7133:8dac:5273]) by smtp.gmail.com with ESMTPSA id j4-20020a9d7384000000b005b23499b66dsm248059otk.23.2022.04.14.10.19.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 Apr 2022 10:19:01 -0700 (PDT) Message-ID: Date: Thu, 14 Apr 2022 14:18:59 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Content-Language: en-US To: Noah Goldstein Cc: GNU C Library References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> <20220413202401.408267-5-adhemerval.zanella@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2022 17:19:08 -0000 On 14/04/2022 14:10, Noah Goldstein wrote: > On Thu, Apr 14, 2022 at 12:03 PM Adhemerval Zanella > wrote: >> >> >> >> On 13/04/2022 20:12, Noah Goldstein wrote: >>> On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha >>> wrote: >>>> >>>> + >>>> + /* eax zeroed by round loop. */ >>>> + leave; >>>> + cfi_adjust_cfa_offset(-8) >>>> + cfi_def_cfa_register(%rsp); >>>> + ret; >>>> + int3; >>> why int3? >> >> It was originally added on libgcrypt by 11ade08efbfbc36dbf3571f1026946269950bc40, >> as a straight-line speculation hardening. It is was is emitted by clang 14 and >> gcc 12 with -mharden-sls=return. >> >> I am not sure if we need that kind of hardening, but I would prefer to the first >> version be in sync with libgcrypt as much as possible so the future optimizations >> would be simpler to keep localized to glibc (if libgcrypt does not want to >> backport it). > > Okay, can keep for now. Any thoughts on changing it to sse2? > No strong feeling, I used the ssse3 one because it is readily available from libgcrypt. > >> >>>> +END (__chacha20_ssse3_blocks8) >>>> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h >>>> new file mode 100644 >>>> index 0000000000..37a4fdfb1f >>>> --- /dev/null >>>> +++ b/sysdeps/x86_64/chacha20_arch.h >>>> @@ -0,0 +1,42 @@ >>>> +/* Chacha20 implementation, used on arc4random. >>>> + Copyright (C) 2022 Free Software Foundation, Inc. >>>> + This file is part of the GNU C Library. >>>> + >>>> + The GNU C Library is free software; you can redistribute it and/or >>>> + modify it under the terms of the GNU Lesser General Public >>>> + License as published by the Free Software Foundation; either >>>> + version 2.1 of the License, or (at your option) any later version. >>>> + >>>> + The GNU C Library is distributed in the hope that it will be useful, >>>> + but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>>> + Lesser General Public License for more details. >>>> + >>>> + You should have received a copy of the GNU Lesser General Public >>>> + License along with the GNU C Library; if not, see >>>> + . */ >>>> + >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> +unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst, >>>> + const uint8_t *src, size_t nblks); >>>> + >>>> +static inline void >>>> +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src, >>>> + size_t bytes) >>>> +{ >>>> + if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4) >>> >>> Can we make this an ifunc? >> >> I though about it, but if you check on arc4random implementation the >> chacha20_crypt is called for the whole internal buf once it is exhausted. >> Assuming a 1 cycle per byte (as indicated by bench-slope libgrcypt on >> my machine), it will be at least 1k cycles to encrypt each block. I >> am not sure if setting up an internal PLT call to save a couple of cycles >> on a internal function will really show anything significant here (assuming >> that the PLT call won't add more overhead in fact). >> >> Besides that the code boilerplate to setup the internal ifunc is also >> way more complex. > > Okay for now as long as open to changing later (not that we will but that > this isn't locking us into the decision). For sure, if iFUNC does help on this case the change should be simple for the generic code. The boilerplate is for the x86_64 bits in facts (to setup the iFUNC resolver, Makefile, etc.).