From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id B90513858D1E for ; Wed, 20 Apr 2022 18:38:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B90513858D1E Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 23KHlEcI018946; Wed, 20 Apr 2022 18:38:43 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com with ESMTP id 3fjjfm7v07-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 20 Apr 2022 18:38:42 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 23KIb1P4030476; Wed, 20 Apr 2022 18:38:42 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma02dal.us.ibm.com with ESMTP id 3ffneadsa6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 20 Apr 2022 18:38:42 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 23KIcfTv22610336 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Apr 2022 18:38:41 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 68E0AAC059; Wed, 20 Apr 2022 18:38:41 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 34CA1AC05B; Wed, 20 Apr 2022 18:38:41 +0000 (GMT) Received: from [9.160.87.181] (unknown [9.160.87.181]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 20 Apr 2022 18:38:41 +0000 (GMT) Message-ID: Date: Wed, 20 Apr 2022 13:38:40 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v3 7/9] powerpc64: Add optimized chacha20 Content-Language: en-US To: Adhemerval Zanella , libc-alpha@sourceware.org References: <20220419212812.2688764-1-adhemerval.zanella@linaro.org> <20220419212812.2688764-8-adhemerval.zanella@linaro.org> From: Paul E Murphy In-Reply-To: <20220419212812.2688764-8-adhemerval.zanella@linaro.org> Content-Type: text/plain; charset=UTF-8; format=flowed X-TM-AS-GCONF: 00 X-Proofpoint-GUID: YgxmA4z28wfXTiV7UEnSbBN3Hl1dRdaT X-Proofpoint-ORIG-GUID: YgxmA4z28wfXTiV7UEnSbBN3Hl1dRdaT Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-04-20_05,2022-04-20_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 mlxscore=0 clxscore=1011 bulkscore=0 mlxlogscore=999 suspectscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200109 X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Apr 2022 18:38:45 -0000 On 4/19/22 4:28 PM, Adhemerval Zanella via Libc-alpha wrote: > It adds vectorized ChaCha20 implementation based on libgcrypt > cipher/chacha20-ppc.c. It targets POWER8 and it is used on > default for LE. > diff --git a/sysdeps/powerpc/powerpc64/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/chacha20-ppc.c > new file mode 100644 > index 0000000000..e2567c379a > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/chacha20-ppc.c How difficult is it to keep this synchronized with the upstream version in libgcrypt? Also, this seems like it would be a better placed in the power8 subdirectory. > diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h > new file mode 100644 > index 0000000000..a18115392f > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h > @@ -0,0 +1,47 @@ > +/* PowerPC optimization for ChaCha20. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, > + const uint8_t *src, size_t nblks) > + attribute_hidden; > + > +static void > +chacha20_crypt (uint32_t *state, uint8_t *dst, > + const uint8_t *src, size_t bytes) > +{ > + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, > + "CHACHA20_BUFSIZE not multiple of 4"); > + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, > + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); > + > +#ifdef __LITTLE_ENDIAN__ > + __chacha20_power8_blocks4 (state, dst, src, > + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > +#else > + unsigned long int hwcap = GLRO(dl_hwcap); > + unsigned long int hwcap2 = GLRO(dl_hwcap2); > + if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) > + __chacha20_power8_blocks4 (state, dst, src, > + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); > + else > + chacha20_crypt_generic (state, dst, src, bytes); > +#endif This file doesn't seem to obey the multiarch conventions of other powerpc64 specific bits. Is it possible to implement multiarch support similar to the libc/libm routines?