From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id DB3FE3858C53 for ; Thu, 14 Apr 2022 18:25:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DB3FE3858C53 Received: by mail-oi1-x22e.google.com with SMTP id k13so6316042oiw.1 for ; Thu, 14 Apr 2022 11:25:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=45YgUawhY1XcS91SIrWYXKJRQCpANf9QBFAI8rqRiFA=; b=RzaKe0lt1VUTleGKkF8BvPgCdqzCvVzB4aES5X1Kb2tsqYJlXLatx9gcD7Y6pKQIf7 n1+/hz2sOd8Z5+zuyzCf/Jfotu9Vne459IHfZmCgEDqODTLd9oqnQX3vTnMHYD8OyWAK Y0TeOysQv30L5Bf5HOkwtnGmA3gnpOO+QyqdFoRI/nfhRVJ3OmoWF+jwDsMybqgTE0ns y6RRStkCv8PlYxRnj8wXZeI9btixRX4aTY1tAEOZZXVr/q6Z99UvndcvSRTDOLrucmEu wrQPA32EJYGpLv/LWBCVpko6NNGXD/kEqSfHZlred+Zmq7Xwpt1HyWiDr1oKP2Qj7aaI E4zg== X-Gm-Message-State: AOAM530JBSv5/Iv8G/FmvM10Ymd3jRs+1RNTYr4RuTcuV9463toeva0B xZ44PYaXQniRFccz6KSY90A4IiU+bmWxCA== X-Google-Smtp-Source: ABdhPJw0blU8Bj0fJd+AH3g3hZUEc5nXAhSdwXQdlO8Uz7StcyXwWlXVLUO8TvtaR2GOzKnT+26nVA== X-Received: by 2002:a05:6808:209e:b0:2da:4de9:e632 with SMTP id s30-20020a056808209e00b002da4de9e632mr2014019oiw.214.1649960726726; Thu, 14 Apr 2022 11:25:26 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:431f:3dc9:7133:8dac:5273? ([2804:431:c7ca:431f:3dc9:7133:8dac:5273]) by smtp.gmail.com with ESMTPSA id e91-20020a9d2ae4000000b005e909579810sm293329otb.42.2022.04.14.11.25.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 Apr 2022 11:25:26 -0700 (PDT) Message-ID: <9b051f4e-0b42-d9bf-866a-9b2e080ca073@linaro.org> Date: Thu, 14 Apr 2022 15:25:24 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Content-Language: en-US To: Noah Goldstein Cc: GNU C Library References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> <20220413202401.408267-5-adhemerval.zanella@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2022 18:25:29 -0000 On 14/04/2022 14:22, Noah Goldstein wrote: > On Thu, Apr 14, 2022 at 12:19 PM Adhemerval Zanella > wrote: >> >> >> >> On 14/04/2022 14:10, Noah Goldstein wrote: >>> On Thu, Apr 14, 2022 at 12:03 PM Adhemerval Zanella >>> wrote: >>>> >>>> >>>> >>>> On 13/04/2022 20:12, Noah Goldstein wrote: >>>>> On Wed, Apr 13, 2022 at 1:27 PM Adhemerval Zanella via Libc-alpha >>>>> wrote: >>>>>> >>>>>> + >>>>>> + /* eax zeroed by round loop. */ >>>>>> + leave; >>>>>> + cfi_adjust_cfa_offset(-8) >>>>>> + cfi_def_cfa_register(%rsp); >>>>>> + ret; >>>>>> + int3; >>>>> why int3? >>>> >>>> It was originally added on libgcrypt by 11ade08efbfbc36dbf3571f1026946269950bc40, >>>> as a straight-line speculation hardening. It is was is emitted by clang 14 and >>>> gcc 12 with -mharden-sls=return. >>>> >>>> I am not sure if we need that kind of hardening, but I would prefer to the first >>>> version be in sync with libgcrypt as much as possible so the future optimizations >>>> would be simpler to keep localized to glibc (if libgcrypt does not want to >>>> backport it). >>> >>> Okay, can keep for now. Any thoughts on changing it to sse2? >>> >> >> No strong feeling, I used the ssse3 one because it is readily available from >> libgcrypt. > > I think the only ssse3 is `pshufb` so you can just replace the optimized > rotates with the shift rotates and that will make it sse2 (unless I'm missing > an instruction). Right, do you have a patch for it? I can add it on the v2 I am will send. > > Also can you add the proper .text section here as well (or .sse2 or .ssse3) Ack.