From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by sourceware.org (Postfix) with ESMTPS id 99DC33856265 for ; Wed, 20 Apr 2022 19:23:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 99DC33856265 Received: by mail-oi1-x229.google.com with SMTP id q129so3197908oif.4 for ; Wed, 20 Apr 2022 12:23:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=J7BSOpBSO2iVJfl1NHhwoU9yqkKYGL75kgLjiGIt/+E=; b=Jnn4G2NmXA2+9nH3iPvOX51quEya0wCNwiE7dXm3d7WAFyLiWh2RJecDt2Qd/uF5Iv i0n3A/Ds6mjtMpYYcgVmdyM1w+msZhgrI1KDmOA9sgNQn2DKi2Lc5wAAa58ukYJMk9ps OSR3ijktAWmA5xqNw5IKgCkOJAcZOyMg7qEFgJfhzsPM97Bk5xviZO1QjVYjZGS5/Owm r0YyX6+NzCWNf/uQBt4caefuUY3KlJVIOr+FtvPWDXN4iJYGsplWwPNpmlmyE5OBcV2q g7BkypNUb6lubUgjGKrlAvDwl+AZGOEQuJzUoaoEMqvAxRHyROHbZwwsmgXNTNaniKRJ dNKA== X-Gm-Message-State: AOAM530qpxPbVntvv6MNUv4BGbmPvP5yJKloyr8TzILg0OIiV4X4noOo DAavmAJEwllDrFNZup7qReZh3SBh/NPsHw== X-Google-Smtp-Source: ABdhPJzpCqmLjNuqVmIhRDEVD9TxKuzujjRQ4Wl0KexGl/8Ij3XIddyyhMYeYUW5CHaE183aUbW9/Q== X-Received: by 2002:a05:6808:211e:b0:322:d5a6:969 with SMTP id r30-20020a056808211e00b00322d5a60969mr2559871oiw.164.1650482584759; Wed, 20 Apr 2022 12:23:04 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:c9d0:24b1:bd98:2ef4:714c? ([2804:431:c7ca:c9d0:24b1:bd98:2ef4:714c]) by smtp.gmail.com with ESMTPSA id je19-20020a0568707c1300b000e2d756e76bsm307675oab.58.2022.04.20.12.23.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 20 Apr 2022 12:23:04 -0700 (PDT) Message-ID: Date: Wed, 20 Apr 2022 16:23:01 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v3 7/9] powerpc64: Add optimized chacha20 Content-Language: en-US To: Paul E Murphy , libc-alpha@sourceware.org References: <20220419212812.2688764-1-adhemerval.zanella@linaro.org> <20220419212812.2688764-8-adhemerval.zanella@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Apr 2022 19:23:08 -0000 On 20/04/2022 15:38, Paul E Murphy wrote: > > > On 4/19/22 4:28 PM, Adhemerval Zanella via Libc-alpha wrote: >> It adds vectorized ChaCha20 implementation based on libgcrypt >> cipher/chacha20-ppc.c.  It targets POWER8 and it is used on >> default for LE. > >> diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c >> new file mode 100644 >> index 0000000000..e2567c379a >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c > > How difficult is it to keep this synchronized with the upstream version in libgcrypt?  Also, this seems like it would be a better placed in the power8 subdirectory. It would be somewhat complicate because libgcrypt also implements the poly1305 on the same file (which uses common macros and definition for chacha20) and it adds final XOR based on input stream (which for arc4random usage is not required since it does not add any hardening). It would require to refactor libgcrypt code a bit to split the chacha and poly1305 and to add a macro to XOR the input. > >> diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h >> new file mode 100644 >> index 0000000000..a18115392f >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h >> @@ -0,0 +1,47 @@ >> +/* PowerPC optimization for ChaCha20. >> +   Copyright (C) 2022 Free Software Foundation, Inc. >> +   This file is part of the GNU C Library. >> + >> +   The GNU C Library is free software; you can redistribute it and/or >> +   modify it under the terms of the GNU Lesser General Public >> +   License as published by the Free Software Foundation; either >> +   version 2.1 of the License, or (at your option) any later version. >> + >> +   The GNU C Library is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >> +   Lesser General Public License for more details. >> + >> +   You should have received a copy of the GNU Lesser General Public >> +   License along with the GNU C Library; if not, see >> +   .  */ >> + >> +#include >> +#include >> + >> +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, >> +                    const uint8_t *src, size_t nblks) >> +     attribute_hidden; >> + >> +static void >> +chacha20_crypt (uint32_t *state, uint8_t *dst, >> +        const uint8_t *src, size_t bytes) >> +{ >> +  _Static_assert (CHACHA20_BUFSIZE % 4 == 0, >> +          "CHACHA20_BUFSIZE not multiple of 4"); >> +  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, >> +          "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); >> + >> +#ifdef __LITTLE_ENDIAN__ >> +  __chacha20_power8_blocks4 (state, dst, src, >> +                 CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >> +#else >> +  unsigned long int hwcap = GLRO(dl_hwcap); >> +  unsigned long int hwcap2 = GLRO(dl_hwcap2); >> +  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) >> +    __chacha20_power8_blocks4 (state, dst, src, >> +                   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >> +  else >> +    chacha20_crypt_generic (state, dst, src, bytes); >> +#endif > > This file doesn't seem to obey the multiarch conventions of other powerpc64 specific bits. Is it possible to implement multiarch support similar to the libc/libm routines? I am not very found of the powerpc multiarch convention and it would require some more boilerplate code to handle BE, but it is doable. So LE will continue to use __chacha20_power8_blocks4 as default, while BE will just select if --with-arch=power8 is defined for for default build. With --disable-multi-arch the power8 will be select iff --with-arch=power8 is set. --- diff --git a/sysdeps/powerpc/powerpc64/Makefile b/sysdeps/powerpc/powerpc64/Makefile index 18943ef09e..679d5e49ba 100644 --- a/sysdeps/powerpc/powerpc64/Makefile +++ b/sysdeps/powerpc/powerpc64/Makefile @@ -66,9 +66,6 @@ tst-setjmp-bug21895-static-ENV = \ endif ifeq ($(subdir),stdlib) -sysdep_routines += chacha20-ppc -CFLAGS-chacha20-ppc.c += -mcpu=power8 - CFLAGS-tst-ucontext-ppc64-vscr.c += -maltivec tests += tst-ucontext-ppc64-vscr endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile new file mode 100644 index 0000000000..8c75165f7f --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile @@ -0,0 +1,4 @@ +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-ppc +CFLAGS-chacha20-ppc.c += -mcpu=power8 +endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c new file mode 100644 index 0000000000..cf9e735326 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c @@ -0,0 +1 @@ +#include diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h similarity index 92% rename from sysdeps/powerpc/powerpc64/chacha20_arch.h rename to sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h index a18115392f..6d2762d82b 100644 --- a/sysdeps/powerpc/powerpc64/chacha20_arch.h +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h @@ -32,10 +32,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); -#ifdef __LITTLE_ENDIAN__ - __chacha20_power8_blocks4 (state, dst, src, - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); -#else unsigned long int hwcap = GLRO(dl_hwcap); unsigned long int hwcap2 = GLRO(dl_hwcap2); if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) @@ -43,5 +39,4 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); else chacha20_crypt_generic (state, dst, src, bytes); -#endif } diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index 71a59529f3..abb0aa3f11 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,3 +1,8 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif + +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-ppc +CFLAGS-chacha20-ppc.c += -mcpu=power8 +endif diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c similarity index 100% rename from sysdeps/powerpc/powerpc64/chacha20-ppc.c rename to sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h new file mode 100644 index 0000000000..270c71130f --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h @@ -0,0 +1,37 @@ +/* PowerPC optimization for ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); + + __chacha20_power8_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +}