From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6292C386F465 for ; Mon, 22 Jun 2020 23:04:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6292C386F465 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05MN22SO168868; Mon, 22 Jun 2020 19:04:29 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 31tyspjj5q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Jun 2020 19:04:28 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05MMu8ul025922; Mon, 22 Jun 2020 23:04:27 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma03wdc.us.ibm.com with ESMTP id 31sa38b1c7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Jun 2020 23:04:27 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05MN4QLa33948128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Jun 2020 23:04:26 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B1317124052; Mon, 22 Jun 2020 23:04:26 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1FA97124054; Mon, 22 Jun 2020 23:04:26 +0000 (GMT) Received: from [9.211.136.148] (unknown [9.211.136.148]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 22 Jun 2020 23:04:25 +0000 (GMT) Subject: Re: [PATCHv4 2/2] powerpc64le: ifunc (almost) all *f128 routines in multiarch mode To: Adhemerval Zanella , libc-alpha@sourceware.org, Tulio Magno Quites Machado Filho References: <4ea0388c-f261-eb53-7f65-c176fdda3eb5@linaro.org> From: Paul E Murphy Message-ID: <110e6c9e-e4b8-3d85-a98f-7b4a818b82ff@linux.ibm.com> Date: Mon, 22 Jun 2020 18:04:25 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <4ea0388c-f261-eb53-7f65-c176fdda3eb5@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-22_15:2020-06-22, 2020-06-22 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 lowpriorityscore=0 clxscore=1011 cotscore=-2147483648 impostorscore=0 phishscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 suspectscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006220148 X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2020 23:04:32 -0000 On 6/22/20 11:57 AM, Adhemerval Zanella via Libc-alpha wrote: > > > On 15/06/2020 17:59, Paul E. Murphy via Libc-alpha wrote: >> See the Makefile changes for high level design/commentary. >> >> V4 changes - >> * Drop patch to add libm_alias_exclusive_ldouble. After >> recent refactoring of fmaf128, it showed some unfixable >> flaws. Instead, use macro renaming for nextafterf128 to >> generate the needed symbols, and rework. >> >> V3 changes - >> * Cleanup comments. >> * Rebase against fmaf128 cleanup >> * Use Makeconfig trick to set var in le/power9 sysdep dir to >> determine if ifunc support is necessary. This works with >> the upcoming CPU detection patch. >> * fmaf128 patch is no longer needed. >> >> V2 changes - >> * move duplicate redirect macros into float128-ifunc-redirect-macros.h >> * replace subshell usage with command sequencing >> * Add more instructive documentation in Makefile about how all >> these ugly pieces work togethor >> * Minor comment cleanup throughout >> * Improve inline documentation/commentary throughout >> >> ---8<--- >> >> Programatically generate simple wrappers for most libm *f128 >> objects and a set of ifunc objects to unify them. >> >> A second set of implementation files are generated which simply >> include the first implementation encountered along the search >> path. This usually works, excepting when a wrapper is overriden >> and makefile search order slightly diverges from include order. >> >> A set of additional headers are included which primarily rely >> on asm redirects to rename, and less frequently macro renames >> where an asm redirect is not possible. These intercept several >> common headers to install redirect and disable macros at specific >> times. This works surprisingly well. Notably, some ugliness >> occurs when header inclusion must be coerced at certain times >> before turning off aliasing and plt bypass wrappers. >> >> Notably, the only special case is s_significandf128.c. It is >> doubly special as exists to support ldouble redirects, and >> exposes subtle difference between makefile rules and search path >> orders. Commentary is inlined. >> >> Admittedly, this makes shared maintenance a tiny bit more >> difficult, but lays groundwork for supporting more optimized >> float128 routines which very overtly assume a soft-fp runtime. >> Changes to internal float128 API should fail at compile time, >> thus build-many-glibcs.py should readily catch any divergence. >> >> Finally, don't build this support if requested CPU is newer >> than power8. >> >> fixup f128 ifunc >> >> drop the patch to introduce the new macro to assist simplification of >> s_nextafter.c. It wasn't thought out well enough. Instead just add >> the ugly macro redirections needed to generate the appropriate >> nexttoward symbols. This is refactoring noise, and while not wrong is not meant to be in the final commit message. > > I am trying to digest the requirements to add such complexity on the > powerpc64le build rules, specially the internally Makefile hackery > required. This is addressed in the notes. Mildly speaking, soft-fp code generation on P8 is quite limited. This is pretty easy to identify in any non-trivial binary128 function. e.g expf128 is almost 1/3 the size on P9. Likewise many complex functions are almost 1/2 the size. Anything soft-fp touches massively increases code size and impedes instruction scheduling. I can get some more concrete numbers, but my hope is this enables us to make even more meaningful improvements to common code when hardware support is available. > > So if I understood correctly, let say we have these targets: > > 1. powerpc64le-linux-gnu > 2. powerpc64le-linux-gnu with --with-cpu=power9 > > The ifunc mechanism to build optimized versions for power9 will be > built only for 1, while for 2. only versions that uses hardware > instruction for __float128 (-mfloat128-hardware gcc option) > will be used. In case 2 (and with any newer cpu), this patch is a no-op. > > So all the rediretion machinery done in the float128-ifunc-* are to > list and redirect internal libm symbols to its float128 counterparts. > One initial issue is this tend to be fragile: it requires to change > arch-specific code when generic code is changed (for instance by > changing the internal symbol name or the caller implementation) The interesting symbol names are likely to see less change, and those that do should mostly be hidden via local calls. This is the price the ppc64le maintainers pay to support multiarch for a large swath of libm. This greatly simplifies the most mundane and error prone pieces. > > Another issue the rules exceptions (such as s_totalorderf128) that > require additional care to check if they result in correct code. Such is already tested via the existing test suite. > > Another possible mantainance issue is to keep updating the exported > symbol list at float128-ifunc.c, float128-ifunc.h, and > float128_private.hfor each new possible symbol in future version. > It against means to correct/change arch-specific code for generic > changes. Note that float128-ifunc.c only defines compat symbols for the old finite entry points. That set should never grow. > > It also increases code size considerable with the potential to keep > increasing with the addition on new libm functions. Stripping debug info, the code size increase of libm is about 220kb added 1210kb library. Not trivial, but not overwhelming. > > Finally the question is how useful would be this change on real > world cases to justify this huge build and permutation complexity. Code size is an interesting metric to measure. The P9 variants are substantially smaller where soft-fp is involved. expf128 is almost 1/3 the size. > > What I would expect in realword cases is if the workload really > uses float128 extensivelly to be built with -mcpu=power9 and/or > -mfloat128/-mfloat128-hardware. It should cover most the required > hotspots and glibc can focus on providing only cases where adding > an specialized ifunc variant does make sense (as for the x86_64 > sysdeps/x86_64/fpu/multiarch/mp*) for instance. > > Also, if an optimized float128 glibc build is paramount, a much > simpler solution would be to just provide a -mcpu=power9 built one. That kicks the can to the distros. I think few ship such libraries. The whole value of multiarch is to expose these benefits without having to make the end user jump through such hurdles. I don't think the x86 comparison holds. Adding a couple of helpful instructions is tame compared to going from soft to hard fp.