From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-ve1eur02on0617.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe06::617]) by sourceware.org (Postfix) with ESMTPS id 361913858402 for ; Thu, 11 Nov 2021 17:05:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 361913858402 Received: from AM5PR04CA0035.eurprd04.prod.outlook.com (2603:10a6:206:1::48) by AM7PR08MB5462.eurprd08.prod.outlook.com (2603:10a6:20b:10b::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.11; Thu, 11 Nov 2021 17:05:12 +0000 Received: from VE1EUR03FT037.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:1:cafe::8) by AM5PR04CA0035.outlook.office365.com (2603:10a6:206:1::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.10 via Frontend Transport; Thu, 11 Nov 2021 17:05:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT037.mail.protection.outlook.com (10.152.19.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 11 Nov 2021 17:05:12 +0000 Received: ("Tessian outbound 8133f76bddb7:v109"); Thu, 11 Nov 2021 17:05:11 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ca829dd91a1f780d X-CR-MTA-TID: 64aa7808 Received: from 09816edea977.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 7FB8431E-ED1E-4AD8-B19E-1C8564AFD7C5.1; Thu, 11 Nov 2021 17:05:03 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 09816edea977.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 11 Nov 2021 17:05:03 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Bdz5EDkHBNEXohOoASreuZi3t8XwVRwRc0M6JIP2CoGDqoVzEMmuAmrriKfqJhhM8ppx36BFBfm/NbQG49iOiEGi1nkSuAZ/N2IOk1ZfAtUqs/zdhDnVt4l3XUyCCIm9O9ZRErLPChQe/OxlPa0jxMvhd+4E2e2UgQO1N4aO5aQCB895A+Yf+wajuYFUwzJrPyVWb1pmCCK4r53dpnCCDi9EWBH3/3N1HjsswWo/pT2rszmxoqVxKWGP9LIzlmu9YehHrRMFbYrjvUR6IW9DQ3gEB5KvPqapnIqF5opkgCdCgMM+j7nPEHg+9Oa9NI94G3uJlf2Kku1SEIh/81DQlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Hm/70ZqHYx52TQEPUA89skwhKJz90zHrqx9AYizklqU=; b=UGko+pjIC0TxaG1/QKo85QTD5yoEWomOKeg69YxnXPImpnDj0OT6yc3l//CNmX4ANvnqwgg1mKRego7RGz0XQePzvnFQ1AdqPnc2Ti/XvalcvKjJ2nlHAcIFjtqGd5ZW/x2HySZqFabFpgxhubyTMgajH9eG35Pd5bP52gbFyxK3mzKVb4xNrGFSZDXnmQTNDx9lAb+U0tbusky1rNpWm16VcthD0avd2m2eJmrQXxDoCMtEh5fUHpoHrPDl9VU9ZKrx2kJ2GnEwe1EsXx96wU2fPp0Pe6roJQ4bU0318MXSgyKFyAXGFyvbVp5FY2IH5XrybVW3JZUbxrpb6MbPhg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB4829.eurprd08.prod.outlook.com (2603:10a6:802:a8::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4690.15; Thu, 11 Nov 2021 17:05:01 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4%9]) with mapi id 15.20.4690.016; Thu, 11 Nov 2021 17:05:01 +0000 From: Wilco Dijkstra To: Adhemerval Zanella , "Paul A. Clarke" CC: "libc-alpha@sourceware.org" , Tulio Magno Quites Machado Filho Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot Thread-Topic: [PATCH v3 5/7] math: Remove powerpc e_hypot Thread-Index: AQHXz14N6PTp+7MyoU+R61LgXgFRjav7oRMAgAE4tmiAAAsegIAAI2/3 Date: Thu, 11 Nov 2021 17:05:01 +0000 Message-ID: References: <20211101202059.1026032-1-adhemerval.zanella@linaro.org> <20211101202059.1026032-6-adhemerval.zanella@linaro.org> <20211109192800.GA4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <37a5bc8c-a9ec-952d-427e-62632f7f7a0a@linaro.org> In-Reply-To: <37a5bc8c-a9ec-952d-427e-62632f7f7a0a@linaro.org> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 074dab92-3f90-b4a0-cb29-cf60146f24c7 Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 5dac9ac2-ce34-4f3f-e737-08d9a5356c9f x-ms-traffictypediagnostic: VE1PR08MB4829:|AM7PR08MB5462: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: GWlTyD3c6ZXzJumOfblMGqGtMjp075jbDdbok9AMuDRitj0QATN2+xeo+u5lfDEqVoLrCK+Q+lOK5xRwhXbv7ZMLMoUUnjXWUIyHStqZn2TRnVxrFLiaJIkydhSURg01NPUtXxfe0bBSsxlXzdhk9bRd2fOnuFZH2XS6CCLNLpmzudvgM1wFNCKu7yL4bwQ8owKcXG5y2GPmGfcDA8yzFbHT4QYZgU6z8XUPk6fLlm/dzYcPt+XDw3D8kIutRZypHSUolsGZxfjMF8YyEqwSz7s/kfDb87vlyBBmkdYb1wRW3qqSkO8uk9vzauU8g65Vn5Ane4xvf9/WBUi3AN0z49b743peDSkZBjDjvHo9Lqr5Msd7sgoZhIoJkSxbyrihrxTtNO1mKc+uO7dBfzDhTY/ydBmu8FNWJ0nvRCBIHtFUQ7KB/9Vj90k7vngpzcVq8J4QCicQC93lA+zjhTZLrOEcy74vmbaEABJcFxgAV8S47SgavD82afbBQ2H7xH5epXAiehxvlw/nEK4wjENJP5DcRoZbXFwBGg1Ou1fRcg1Z8GH/PRHcwsbuSmrAjbUy3ifLJoxP6U4b0qAQLgPxxecFSSstCW0bmdXmKDD8n09cAlj0pNdzE3UF2JP9fKQPNzpBd4oS4wpsS9lPdYlvZMv3UIRDcPQXG56fhJwh6VjECPZWLWWrErAYNysImVUvQN8/p3TYjYXqaM4H2tCBnw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(5660300002)(38100700002)(122000001)(316002)(38070700005)(55016002)(54906003)(86362001)(66556008)(110136005)(71200400001)(26005)(186003)(91956017)(64756008)(2906002)(66946007)(53546011)(66476007)(66446008)(76116006)(6506007)(7696005)(508600001)(33656002)(83380400001)(8936002)(9686003)(52536014)(4326008)(8676002); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4829 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT037.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 9905ace3-febb-491e-b6ca-08d9a53565fd X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5rN55ldJ/PDtS9/E/GVGfOm9YKsFBg/GSWXGKiVsUr36BOOjJwfzYu5Oar70cN/VLy3VxV0qPVKqUNE3YEOfP3QbziMg4uX8f4h/W5JptVc6BY/bduQS6zlOIY+1B6m+ZuWbwwCV3sHp5M3FK+KzuXpmu75GC4zElD7DKJlOhxR1iwdUrkgVZmhmz466W5G8p7y6bIN89l6tIZplkCObNPByO5yjlP8+9HiJ1WEnyRb2HybdPZKEayz+Q4Q5k008Tp+/QyOcJu20AzPhlpcWX8wWZnfL9UIlhNdHkCMEKZkiQslIVZVKxHG5fN9PiMnY4tR5cerTk1+adz+yLtmQfm1HCEbNT62AaYaNzldcSOHeWbxQ+mwc7w7Ps2cVc87F3YkDJDwOStZaMJhYFULsKIZKdX0b+cFYuCEBfe7HZRHk155Z2jPJ0RUE/7ridyVFc8Lts1pXXFp3bGLluVdKMgyQODYjre1CygN4lwxBRNpMrmnwo3J/4z4E8VGwSda9yCGBNYo7gt848iA2iFdQvskAygR4t4nN6/I87Fo4h1vyVqcvz+wOUNRL4bOSPMo6V8lXMK6IghdXGMccqZsd/iDHu3h162QZB2ggUQpbXonfUmtXDVZVhUWh0ws2IjBvpGouUc4EUShXK28Jvr4xSMVMWBzioazFoA4hVW00BL1WXAcBUqw/60j5zqrG9b/ftyauLi2TG5ooibhrJE5hPw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(8936002)(8676002)(70586007)(508600001)(70206006)(4326008)(47076005)(356005)(316002)(81166007)(86362001)(2906002)(82310400003)(6506007)(26005)(336012)(36860700001)(5660300002)(53546011)(52536014)(9686003)(33656002)(83380400001)(55016002)(186003)(7696005)(110136005)(54906003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Nov 2021 17:05:12.2796 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5dac9ac2-ce34-4f3f-e737-08d9a5356c9f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT037.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5462 X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, SPF_HELO_PASS, SPF_PASS, TVD_SUBJ_WIPE_DEBT, TXREP, UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Nov 2021 17:05:17 -0000 Hi Adhemerval,=0A= =0A= > On 10/11/2021 11:34, Wilco Dijkstra wrote:=0A= >> I think the new algorithm will always be slower due to the dependent sqr= t and=0A= >> division. So it's hard to improve unless we only use it for special case= s (eg. when=0A= >> ax and ay are close). Returning sqrt (fma (ax, ax, ay * ay)) is about tw= ice as fast=0A= >> and gives just over 1 ULP, so we're losing a lot of performance for a sm= all ULP=0A= >> improvement.=0A= >=0A= > My main drive for this change is remove the arch-specific implementation = in=0A= > favor of an implementation that might be optimized better by the compiler= =0A= > without the need to extra hacks by arch-specific hooks (as I did for powe= r7).=0A= =0A= I'm all for having a single optimized generic implementation like we did fo= r other=0A= math functions. In general there is little scope for compiler optimizations= due to=0A= conservative FP settings - it is all down to highly optimizing both the alg= orithm=0A= and implementation.=0A= =0A= > Another option is to use the powerpc implementation which favor FP over i= nteger=0A= > as the default one.=0A= =0A= That is the fastest implementation. It is less accurate though (~1.04ULP wi= th FMA=0A= and ~1.21ULP without FMA), so I'm not sure that would be acceptable.=0A= =0A= I did some quick optimizations on the new algorithm, on Neoverse N1 my fast= est=0A= version is less than 10% slower than the powerpc version, and has ~0.94 ULP= error.=0A= =0A= Cheers,=0A= Wilco=