From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 7D7BE3858D34 for ; Thu, 2 May 2024 02:59:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7D7BE3858D34 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7D7BE3858D34 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714618790; cv=none; b=o20ZV2UDbg03uXzqf9E+Q662qIe6/OOmBjwCcqvN5t+b4z5WTJpPBBGUH5jZBSyFDqjFNWlo3lowvEF7xkAjMWADbMkg/GGZUn4H8ID+O7ahRvjfd5Dv+34xDLsT1msxiO09cjlP23WsyujIvv6qynqwAlCDkqsN+tZyQeBXbDg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714618790; c=relaxed/simple; bh=Uct7E4Co00fwi6H8wToZzt2t8/owjbW3WxXtcSbrA+A=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=bL22hXLRoKsgq7NoI6Z9ALGyj6mTf93+HtDMT87h8iUe3RMISTVdD0x21dX/QqWs7RXhKwo2N7xMRpFL/CcJNblW7KKn/iDbNBlb5oO3ouOTPwTt9W/5VScDb5NDH5CqjiEb3g4FHYYWV1nR7cE8cZMyyvNy4JgifqVM2hOhfQc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4422x6gL008733; Thu, 2 May 2024 02:59:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=TBlQHWeFUMxVxcxcZVhLxo8srvmM0pIXFb4g9TRin5o=; b=b2h7qHbLQUvL59SgKjfarowlc7UB2rVhUq3iAGErHynh49PpSYqqlu87/Hhr90GwQSu5 cBhmK7e6bu1SHtWA5wZOyE1u6xTrAGtlK7sKwllfC01lanyB+N+bC7A35ajM5iJByZXo pbORvmHsi5AGDJr0wxZbRcUXq0ZF+WQ6n9NvRjUxHmCF02OkxySgjG74A2eJGPEAofZe kZw6L3ynd6V4+aCGRt3R46sytoanh9wijg4EoF+tbJYVV9rs8/kFEM8dPUFJh2a3w28x NJhJivIaBCpryixEsPR4Ukl209QbpOJ2oeh1JYP/v2s0OjJ0n8CoIcr28Igru19AZu+O PA== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xv24m012y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 May 2024 02:59:44 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 44200clw003036; Thu, 2 May 2024 02:59:42 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3xscppns50-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 May 2024 02:59:42 +0000 Received: from smtpav06.dal12v.mail.ibm.com (smtpav06.dal12v.mail.ibm.com [10.241.53.105]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4422xdRI52887964 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 2 May 2024 02:59:41 GMT Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C99A558059; Thu, 2 May 2024 02:59:39 +0000 (GMT) Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6C5B45805F; Thu, 2 May 2024 02:59:39 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.64.209]) by smtpav06.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 2 May 2024 02:59:39 +0000 (GMT) Date: Wed, 1 May 2024 22:59:37 -0400 From: Michael Meissner To: Peter Bergner Cc: Zack Weinberg , Richard Henderson , libc-alpha@sourceware.org, Michael Meissner Subject: Re: Maybe we should get rid of ifuncs Message-ID: References: <71a749ba-d843-424a-9a41-1d20f6be685c@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <71a749ba-d843-424a-9a41-1d20f6be685c@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: jS953bNRK7-Tgrh_wDje8wAyByZXZ2iO X-Proofpoint-ORIG-GUID: jS953bNRK7-Tgrh_wDje8wAyByZXZ2iO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1011,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-05-01_16,2024-04-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 phishscore=0 spamscore=0 bulkscore=0 clxscore=1011 priorityscore=1501 adultscore=0 mlxlogscore=844 mlxscore=0 impostorscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2405020010 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, Apr 27, 2024 at 07:24:05PM -0500, Peter Bergner wrote: > On 4/24/24 9:43 AM, Zack Weinberg wrote: > > I'm very curious what the plan for function multiversioning in GCC > > and LLVM is, and how close to declarative it gets. > > GCC (at least on powerpc) already supports it via the target_clones > attribute. See gcc/testsuite/gcc.target/powerpc/clone*.c for examples. > Basically, it looks like (from clone3.c): > > __attribute__((target_clones("cpu=power10,cpu=power9,default"))) > long mod_func (long a, long b) > { > return (a % b) + s; > } > > long mod_func_or (long a, long b, long c) > { > return mod_func (a, b) | c; > } > > > Mike knows how this works better than I, but GCC automatically emits an > ifunc resolver for the different clones and looks to use the HWCAP* > architecture mask associated with the cpu we're compiling for. > The "default" function being called in the case our ifunc resolver > doesn't match any of the HWCAP* masks from the cpus we're compiling > for. Sorry, I've been in and out of the hospital with my wife. > Mike, it seems like this is more of a "cpu" clone and not a true HWCAP > test, so this specific thing doesn't (at least currently) work for > something like __attribute__((target_clones("vsx,mma,default"))) ? > Or did I misread the code? There are 3 things GCC provides: 1) Is the ability to write an ifunc function. Any call to func is always indirect. The loader calls resolver at program/shared library load to get the address of the function to use: extern int func_power10 (void); extern int func_power9 (void); extern int func_default (void); int func (void) __attribute__ ((__ifunc__ ("resolver"))); void * resolver (void) { if (__builtin_cpu_supports ("arch_3_1")) return (void *) func_power10; else if (__builtin_cpu_supports ("arch_3_00")) return (void *) func_power9; else return (void *) func_default; } 2) The ability to change the target defaults for a particular function: int func_power10 (void) __attribute__((__target__("cpu=power10"))); int func_power10 (void) { // this function will be compiled for power10 } GCC allows the stuff inside __attribute__ to have 2 prefix underscores and 2 suffix underscores or not. I prefer to always use the underscore prefixes and suffixes just in case the user defined a 'target' macro (i.e. the stuff within attributes is subject to macro replacement). An alternative is to use #pragmas to change the defaults for a bit: #pragma GCC push_options #pragma GCC target ("cpu=power10") int func_power10 (void) { // compiled with power10 options } #pragma GCC target ("cpu=power9") int func_power9 (void) { // compiled with power9 options } #pragma GCC pop_options int func_default (void) { // compiled with the default options } 3) The ability to use target clones, where the compiler constructs the ifunc function, and recompiles the function multiple times with different target defaults. extern int func (void) __attribute__((__target_clones__("cpu=power10,cpu=power9,default"))); int func (void) { // 3 versions of func are compiled along with an ifunc resolver. } Note, 'default' must always be listed in the target clones. You can only specify one option (i.e. you can't do something like compile -mcpu=power9 and -mtune=power10 into one option). So in practice, only -mcpu= options are useful. If we need better fine grained support, we could have -mcpu options that adds or subtracts the options. The automatic ifunc only looks at hwcap/hwcap2 bits, and it sorts it so that it checks for power10 first, etc. At present, we have target clone support for: power6 power7 power8 power9 power10 Note since there is no real hwcap bit for power11, with my current patches for power11, if you do: extern int func (void) __attribute__((__target_clones__("cpu=power11,cpu=power10,cpu=power9,default"))); it will compile both power11 and power10 clones, but the resolver will only call the power10 clone because we don't have a separate hwcap bit for power11 (that I know of). If we do have a separate hwcap bit, it is easy to add support for power11. Now one thing that I thought had been done, but it appears no longer being done is that the #ifdefs (i.e. _ARCH_PWR10, etc.) aren't changed when compiling the target clone. > > I'll note I'm pretty sure we (IBM/powerpc) have added ifunc usage to > OpenBLAS and some other libraries outside of glibc. > > > Peter > > -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com