From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150088.outbound.protection.outlook.com [40.107.15.88]) by sourceware.org (Postfix) with ESMTPS id DE6FC3858405 for ; Thu, 28 Oct 2021 15:19:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DE6FC3858405 Received: from AM7PR03CA0005.eurprd03.prod.outlook.com (2603:10a6:20b:130::15) by AM8PR08MB6338.eurprd08.prod.outlook.com (2603:10a6:20b:369::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.15; Thu, 28 Oct 2021 15:19:36 +0000 Received: from AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:130:cafe::b3) by AM7PR03CA0005.outlook.office365.com (2603:10a6:20b:130::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.15 via Frontend Transport; Thu, 28 Oct 2021 15:19:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT012.mail.protection.outlook.com (10.152.16.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 15:19:35 +0000 Received: ("Tessian outbound 7b0bcc4a550a:v108"); Thu, 28 Oct 2021 15:19:35 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d1514982a6e9af93 X-CR-MTA-TID: 64aa7808 Received: from 4ee9380a83c7.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 308DBC30-AD0B-4E8E-9C3F-0D1B6A849059.1; Thu, 28 Oct 2021 15:19:26 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4ee9380a83c7.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 28 Oct 2021 15:19:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Pmb6GaTZq/PsgtuTwFzgTRKg8E+X9xurWMf+3g+H61DC97zX1KKvE/LG0BUA67SXATXNKFXASh57Fvqel5PLCrK3W2Uo+ahi8W6EIdmxJHPG3LAmWy/Nycwcz66JTbYOhpsOcOuydTvP5wXpKFwZjTgQaNSHVcTZXjW9cgdaXMF/9QyoJcJLXLNlb9xBdyxrCso4shWBc8DQpvbOzHg7fBHqAAX6UkKH0Tk06Gkc3Nq4P0VU/WNVagt1c/xxz/uB9I7XK6+oZqZRHc2sUq7OIjcPvNXeCyQhX4j2tXfH+ck8k3rnV9WBDpflzpGq5fOQ60+tYBuchDocb5PsX/kMZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+HDvSp0EIFjCYXwYQsqGDb5Q9xGtQT8rqrv/eNCg5Ho=; b=YjSe8/pEezFyvZmmdvkVffDnpG/Cw35U1UCK2lBe0dU+LJ6i5uiRYiZjxW5T1G5wWCdMA+lkUrV5DDNoriVQoVF47wy9gWbwbUPgw+VJFpqzf84iWYnfE11vGxMw2/tSAct0jQNkToMha6gG0UK+bm9fPq2lyn6mOcpDHrHYzAwqgaEikeqzdW55sydxmlYhSisk68pJmFE1iFx8XMib4OOfquO+yXZ5je6Sdg5TfdR//QwbGjmWTX+HfceXhDE5Y1DM2SRP7z6KJK8wLPigPBi9mGd9R8iSUnN2mz2fIAJxE13nlzI8CSb/+OEFP6uBV2sm/MSt++s7RTyf6hbVZQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB5168.eurprd08.prod.outlook.com (2603:10a6:803:106::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4628.18; Thu, 28 Oct 2021 15:19:24 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::281b:cded:83ff:1856]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::281b:cded:83ff:1856%3]) with mapi id 15.20.4649.015; Thu, 28 Oct 2021 15:19:24 +0000 From: Wilco Dijkstra To: Adhemerval Zanella CC: 'GNU C Library' Subject: [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64) Thread-Topic: [PATCH v2 4/9] math: Use an improved algorithm for hypot (dbl-64) Thread-Index: AQHXzAlYgp+fRrA3NUO1GAinNKLN2w== Date: Thu, 28 Oct 2021 15:19:23 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 0ef8fe69-2bbb-de03-1ad0-24435d7ace0a Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 4bf78a84-b5b4-4001-6852-08d99a265a02 x-ms-traffictypediagnostic: VE1PR08MB5168:|AM8PR08MB6338: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: x8L16QiQsVr9RypqmNULuX6wsLmVAw5HZpYusORuQ4WLP456VLgt2p9s6Tw7Ddwl7LjzuXfLZIumZmgdrOeg5UrYLTjmT1PkPh7fJJWUMoVNHO+mhykGcC2b3yfpJOBWe1Zklp+0nZQzO2wEXBpwGjpKTOh0upkwYqO7VLxEQT6ST3afPliFi5X7IRkPS7Iwkmi23FjMF49tdeUzYra/Z88GVqIywDMU6y/f6IUQmkbhZRB/FEJmYA8+5dHDa0ezX3Cq6ktOJoIJgIlZzmscziF+qoC3e1bSIFCjAGpsfVWB85LqVtdhBQCZXIdU+V/E1sOZHBuk76CgxHmQdCd6OCOd2F3k3SHT6qfGxHZp6f3txEKW8CsJLPD1hZcFL8P9TlEPKdKAEIVC5rMRiMTS1wVfW/eImj8VX0vtFzlsbMcaUagXzIRPvrJF9KIln2s/B3057RKybPKnyp+cNpENbxWeLm2PXgbvVqmiJ0wXTbPi3LPHrTz96zppIjqqMs9tbJP1Rw0AKasCCBs71fmJCEkFmJL5vdLq7DkPsy1zAYWlJWNJSKnKUKG2FNyKpODWNQ9TpmlJvR3lZleqfCKOJIzol2YlQsMy+bBF6FGNGz6Yz1N5SMxp18hEWibaePrkNg66dfekgI7XrV2AbnZUVv1p7Hv7bS7cpz2uvH01GN9YVqgQHWnLzql/4VlR7M4Ky3TwGxny7B8znkiYfKWUog== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(122000001)(7696005)(38070700005)(316002)(86362001)(6506007)(4326008)(38100700002)(186003)(26005)(66556008)(66476007)(66446008)(4744005)(5660300002)(52536014)(6916009)(64756008)(76116006)(8676002)(55016002)(33656002)(8936002)(508600001)(2906002)(66946007)(9686003)(71200400001); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5168 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: dce5010e-f319-4a8d-57df-08d99a2652ea X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Grdb14HrToK58eDjCrMtWOK2Eoe1IRqUx9qqcG8VP64mReiDVZHjdrwNOvEWuBE+2mtSfoDUOOWthIe9YFeTEiHsb6kZsjyIf1cWEYe8oReKFWdGzWG+JtWcIMaRVrz1fxU17/XIc+XJTuOFP5rFIj5a15f436fRdzTjyWiwoPysg9g4C+FIIG0ERacEZmaJZBjksaZ6GwNwBXuvF9JCeZ7YK203ZQc840zY1R/x5ETsYsHXiKWd2Xp1iecufU334EVvnlqKTGJr57rhVOAqwlQDXmzGFhrkXVBq7UkfcL7FGQF8ELlH/RN01ydtNXnvKR7PM5zRaP4wzl+wutyNFoaG3kD8rZ4zUTI0tSklK2M2P2aMjIly4Bvn3369pAxgrfJ3wKO8v/aN2EkoPWPzaw7pf902GzHqsPjTrcTz3JjN2Ix/ZsJAFEt8b+qH5aPD+xhiy7iKnHCul60cntlo/7y4CggBabd1iNRqsfWAEioWTpYg7n9ZMRvMa/r6ZFLHh/EGDeWEPiBIXqiZijTK3o9sCmy9zbwguymRAnuIcyZqIhPEcnw/ns9zp6awzuBakEp1ORkrIsLbjqp9EqLKyxyTZLrUKhv8F/k89nJjrDpIHRZVkFOZoPUOxp8QyW5toMvsSj9PXFK/AU2IWIArV3Gn4lgKvjDlAzpK1tFlBsi/j6D9/bhqaJGq9kwfkk/l5zrDLoXZmyXHrtDTG/Wl2w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(33656002)(82310400003)(36860700001)(336012)(47076005)(86362001)(70586007)(70206006)(2906002)(5660300002)(52536014)(316002)(508600001)(55016002)(6506007)(4744005)(6862004)(7696005)(9686003)(26005)(186003)(8936002)(8676002)(4326008)(356005)(81166007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2021 15:19:35.8608 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4bf78a84-b5b4-4001-6852-08d99a265a02 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT012.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6338 X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2021 15:19:44 -0000 Hi Adhemerval,=0A= =0A= On AArch64 the new version is ~25% slower even after removing wrappers.=0A= Using the faster checks for special cases helps here too of course. However= =0A= there is another issue: issignaling is not inlined, forcing a frame to be c= reated=0A= plus several callee-saves. This fixes it:=0A= =0A= if (!isfinite(x) || !isfinite(y))=0A= {=0A= if ((isinf (x) || isinf (y))=0A= && !issignaling_inline (x) && !issignaling_inline (y))=0A= return INFINITY;=0A= return x + y;=0A= }=0A= =0A= This gave a 20% speedup, so it is now only a little slower than before.=0A= We could rename the inline function to __issignaling I suppose, and then=0A= it should be used as long as you include math_config.h.=0A= =0A= + h =3D math_narrow_eval (h * scale);=0A= + math_check_force_underflow_nonneg (h);=0A= + return h;=0A= =0A= I don't think you need math_check_force_underflow_nonneg at all given=0A= h * scale should already set the underflow flag correctly. Removing this=0A= gives performance within ~2% of the original.=0A= =0A= Wilco=