From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2d.google.com (mail-oa1-x2d.google.com [IPv6:2001:4860:4864:20::2d]) by sourceware.org (Postfix) with ESMTPS id 0C6503858430 for ; Thu, 14 Apr 2022 17:03:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0C6503858430 Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-e2fa360f6dso5885064fac.2 for ; Thu, 14 Apr 2022 10:03:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=XCxbXhEsd7bYFK87of8aPasY+an5sr052sTvL8TKqDk=; b=AyDav4jR7mns/ZkzQQiOlc5MqgtmbjbI1jAQI8lt3ig7P4/WDsApyZLa2SUekRUWbQ qYcyxhTX0fyfzYvrkD4zyzF680fpDyH5Q7KptwAw4Fg83jdxOwMgko7zk5IUvAtVC5IV k2Ha/3uHEX4fOaR18+DqL34JlXsbyr5ymA7BJrQqOtvzyzOPgfRh3KArBmpId71KIs7z GaYXiv688SsWKfAU3yzTpoZNpprf4AaP+VJN5T5kFACQ2iHtjUJQTZlvNFOZo1SgN7RL /cIAhRG2+6E2grgLkVWjonM2cwKRWMIduGeI7iwRh3IzE08OIo9znfRlwTB2dJASshrB cAJQ== X-Gm-Message-State: AOAM53331oXwt6Hc/yhBVnTCyAN2LrH8ez5sR/f2Hssc51BIOhCqEEhk 62w2ccAYLrN6Tn0H6hwhR7lJHev+n8PBRw== X-Google-Smtp-Source: ABdhPJyAtBb0FoV7nr5up2wYNxV2AJ4ApePY+MRKOsdzI/00bVjAA6aUgd9aNZTd9nlzEsmXb9bE7Q== X-Received: by 2002:a05:6870:e989:b0:da:c099:7939 with SMTP id r9-20020a056870e98900b000dac0997939mr1980087oao.206.1649955785107; Thu, 14 Apr 2022 10:03:05 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:431f:3dc9:7133:8dac:5273? ([2804:431:c7ca:431f:3dc9:7133:8dac:5273]) by smtp.gmail.com with ESMTPSA id l14-20020a056820030e00b00320edaf9b8esm187146ooe.44.2022.04.14.10.03.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 Apr 2022 10:03:04 -0700 (PDT) Message-ID: Date: Thu, 14 Apr 2022 14:03:02 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Content-Language: en-US To: Noah Goldstein Cc: GNU C Library References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> <20220413202401.408267-5-adhemerval.zanella@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2022 17:03:09 -0000 On 13/04/2022 20:12, Noah Goldstein wrote: > On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha > wrote: >> >> + >> + /* eax zeroed by round loop. */ >> + leave; >> + cfi_adjust_cfa_offset(-8) >> + cfi_def_cfa_register(%rsp); >> + ret; >> + int3; > why int3? It was originally added on libgcrypt by 11ade08efbfbc36dbf3571f1026946269950bc40, as a straight-line speculation hardening. It is was is emitted by clang 14 and gcc 12 with -mharden-sls=return. I am not sure if we need that kind of hardening, but I would prefer to the first version be in sync with libgcrypt as much as possible so the future optimizations would be simpler to keep localized to glibc (if libgcrypt does not want to backport it). >> +END (__chacha20_ssse3_blocks8) >> diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h >> new file mode 100644 >> index 0000000000..37a4fdfb1f >> --- /dev/null >> +++ b/sysdeps/x86_64/chacha20_arch.h >> @@ -0,0 +1,42 @@ >> +/* Chacha20 implementation, used on arc4random. >> + Copyright (C) 2022 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#include >> +#include >> +#include >> + >> +unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst, >> + const uint8_t *src, size_t nblks); >> + >> +static inline void >> +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src, >> + size_t bytes) >> +{ >> + if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4) > > Can we make this an ifunc? I though about it, but if you check on arc4random implementation the chacha20_crypt is called for the whole internal buf once it is exhausted. Assuming a 1 cycle per byte (as indicated by bench-slope libgrcypt on my machine), it will be at least 1k cycles to encrypt each block. I am not sure if setting up an internal PLT call to save a couple of cycles on a internal function will really show anything significant here (assuming that the PLT call won't add more overhead in fact). Besides that the code boilerplate to setup the internal ifunc is also way more complex.