From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6E0403858C83 for ; Fri, 22 Apr 2022 21:09:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6E0403858C83 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23MHZHmJ022133; Fri, 22 Apr 2022 21:09:36 GMT Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com with ESMTP id 3fkseq5728-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Apr 2022 21:09:36 +0000 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 23ML9GQ1003538; Fri, 22 Apr 2022 21:09:35 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma01wdc.us.ibm.com with ESMTP id 3ffneawf03-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Apr 2022 21:09:35 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 23ML9Ywi12124740 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 Apr 2022 21:09:34 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6A0C5BE053; Fri, 22 Apr 2022 21:09:34 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 168E8BE04F; Fri, 22 Apr 2022 21:09:34 +0000 (GMT) Received: from [9.160.26.230] (unknown [9.160.26.230]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP; Fri, 22 Apr 2022 21:09:33 +0000 (GMT) Message-ID: <2ddad9a6-ad99-7c9b-f8c8-638658d069c9@linux.ibm.com> Date: Fri, 22 Apr 2022 16:09:33 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v3 7/9] powerpc64: Add optimized chacha20 Content-Language: en-US To: Adhemerval Zanella , libc-alpha@sourceware.org References: <20220419212812.2688764-1-adhemerval.zanella@linaro.org> <20220419212812.2688764-8-adhemerval.zanella@linaro.org> From: Paul E Murphy In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed X-TM-AS-GCONF: 00 X-Proofpoint-GUID: havpNjDY5h4eJbAFu-5uVIyf7WSZXrGz X-Proofpoint-ORIG-GUID: havpNjDY5h4eJbAFu-5uVIyf7WSZXrGz Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-04-22_06,2022-04-22_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 lowpriorityscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 bulkscore=0 priorityscore=1501 phishscore=0 clxscore=1015 suspectscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204220088 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Apr 2022 21:09:40 -0000 On 4/20/22 2:23 PM, Adhemerval Zanella wrote: > > > On 20/04/2022 15:38, Paul E Murphy wrote: >> >> >> On 4/19/22 4:28 PM, Adhemerval Zanella via Libc-alpha wrote: >>> It adds vectorized ChaCha20 implementation based on libgcrypt >>> cipher/chacha20-ppc.c.  It targets POWER8 and it is used on >>> default for LE. >> >>> diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c >>> new file mode 100644 >>> index 0000000000..e2567c379a >>> --- /dev/null >>> +++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c >> >> How difficult is it to keep this synchronized with the upstream version in libgcrypt?  Also, this seems like it would be a better placed in the power8 subdirectory. > > It would be somewhat complicate because libgcrypt also implements the > poly1305 on the same file (which uses common macros and definition > for chacha20) and it adds final XOR based on input stream (which > for arc4random usage is not required since it does not add any > hardening). > > It would require to refactor libgcrypt code a bit to split the > chacha and poly1305 and to add a macro to XOR the input. I think this is OK. Thanks for the explanation. > >> >>> diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h >>> new file mode 100644 >>> index 0000000000..a18115392f >>> --- /dev/null >>> +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h >>> @@ -0,0 +1,47 @@ >>> +/* PowerPC optimization for ChaCha20. >>> +   Copyright (C) 2022 Free Software Foundation, Inc. >>> +   This file is part of the GNU C Library. >>> + >>> +   The GNU C Library is free software; you can redistribute it and/or >>> +   modify it under the terms of the GNU Lesser General Public >>> +   License as published by the Free Software Foundation; either >>> +   version 2.1 of the License, or (at your option) any later version. >>> + >>> +   The GNU C Library is distributed in the hope that it will be useful, >>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >>> +   Lesser General Public License for more details. >>> + >>> +   You should have received a copy of the GNU Lesser General Public >>> +   License along with the GNU C Library; if not, see >>> +   .  */ >>> + >>> +#include >>> +#include >>> + >>> +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, >>> +                    const uint8_t *src, size_t nblks) >>> +     attribute_hidden; >>> + >>> +static void >>> +chacha20_crypt (uint32_t *state, uint8_t *dst, >>> +        const uint8_t *src, size_t bytes) >>> +{ >>> +  _Static_assert (CHACHA20_BUFSIZE % 4 == 0, >>> +          "CHACHA20_BUFSIZE not multiple of 4"); >>> +  _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, >>> +          "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); >>> + >>> +#ifdef __LITTLE_ENDIAN__ >>> +  __chacha20_power8_blocks4 (state, dst, src, >>> +                 CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >>> +#else >>> +  unsigned long int hwcap = GLRO(dl_hwcap); >>> +  unsigned long int hwcap2 = GLRO(dl_hwcap2); >>> +  if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) >>> +    __chacha20_power8_blocks4 (state, dst, src, >>> +                   CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); >>> +  else >>> +    chacha20_crypt_generic (state, dst, src, bytes); >>> +#endif >> >> This file doesn't seem to obey the multiarch conventions of other powerpc64 specific bits. Is it possible to implement multiarch support similar to the libc/libm routines? > > I am not very found of the powerpc multiarch convention and it would > require some more boilerplate code to handle BE, but it is doable. > > So LE will continue to use __chacha20_power8_blocks4 as > default, while BE will just select if --with-arch=power8 is defined > for for default build. With --disable-multi-arch the power8 will be > select iff --with-arch=power8 is set. > > --- > > diff --git a/sysdeps/powerpc/powerpc64/Makefile b/sysdeps/powerpc/powerpc64/Makefile > index 18943ef09e..679d5e49ba 100644 > --- a/sysdeps/powerpc/powerpc64/Makefile > +++ b/sysdeps/powerpc/powerpc64/Makefile > @@ -66,9 +66,6 @@ tst-setjmp-bug21895-static-ENV = \ > endif > > ifeq ($(subdir),stdlib) > -sysdep_routines += chacha20-ppc > -CFLAGS-chacha20-ppc.c += -mcpu=power8 > - > CFLAGS-tst-ucontext-ppc64-vscr.c += -maltivec > tests += tst-ucontext-ppc64-vscr > endif > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile > new file mode 100644 > index 0000000000..8c75165f7f > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile > @@ -0,0 +1,4 @@ > +ifeq ($(subdir),stdlib) > +sysdep_routines += chacha20-ppc > +CFLAGS-chacha20-ppc.c += -mcpu=power8 > +endif > diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > new file mode 100644 > index 0000000000..cf9e735326 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c > @@ -0,0 +1 @@ > +#include > diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > similarity index 92% > rename from sysdeps/powerpc/powerpc64/chacha20_arch.h > rename to sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > index a18115392f..6d2762d82b 100644 > --- a/sysdeps/powerpc/powerpc64/chacha20_arch.h > +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h > @@ -32,10 +32,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, > _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); > > -#ifdef __LITTLE_ENDIAN__ > - __chacha20_power8_blocks4 (state, dst, src, > - CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > -#else > unsigned long int hwcap = GLRO(dl_hwcap); > unsigned long int hwcap2 = GLRO(dl_hwcap2); > if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) > @@ -43,5 +39,4 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, > CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > else > chacha20_crypt_generic (state, dst, src, bytes); > -#endif > } > diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile > index 71a59529f3..abb0aa3f11 100644 > --- a/sysdeps/powerpc/powerpc64/power8/Makefile > +++ b/sysdeps/powerpc/powerpc64/power8/Makefile > @@ -1,3 +1,8 @@ > ifeq ($(subdir),string) > sysdep_routines += strcasestr-ppc64 > endif > + > +ifeq ($(subdir),stdlib) > +sysdep_routines += chacha20-ppc > +CFLAGS-chacha20-ppc.c += -mcpu=power8 Is it required to specify mcpu=power8 here? I am thinking about the case of building glibc for power9 (or newer), which could benefit from improved instruction selection when using the VSX builtins. I think this is improved over V3, and seems OK. Thanks. It would be nice to refactor the multiarch/multi-cpu code on powerpc, I agree it is not ideal in its current implementation.