From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2057.outbound.protection.outlook.com [40.107.21.57]) by sourceware.org (Postfix) with ESMTPS id 910153858D20 for ; Thu, 13 Apr 2023 14:30:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 910153858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sDyVVBU5z/ofO6PMsCoqnx9mErNXGoszG/OoPCxZvnU=; b=6h+hIiqxvBbU85zV5c810xf+kFt3yAGbbIA0U6mLWzX/7i569QxoqgZ1Gn1rXX9M/utVgLjYM4WC4ABudsi6cDek5ZLHOrn0ny+hw+Pqa7F3iOEI6LbGj5tXCA9/PVuSEEQ00YrFxecaDD+vpghHvKpgiaMUTo6Ae9iLlv+yBP8= Received: from DB8PR06CA0038.eurprd06.prod.outlook.com (2603:10a6:10:120::12) by PAWPR08MB10118.eurprd08.prod.outlook.com (2603:10a6:102:368::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.41; Thu, 13 Apr 2023 14:30:07 +0000 Received: from DBAEUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:120:cafe::28) by DB8PR06CA0038.outlook.office365.com (2603:10a6:10:120::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.32 via Frontend Transport; Thu, 13 Apr 2023 14:30:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT021.mail.protection.outlook.com (100.127.142.184) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.31 via Frontend Transport; Thu, 13 Apr 2023 14:30:06 +0000 Received: ("Tessian outbound 3a01b65b5aad:v136"); Thu, 13 Apr 2023 14:30:06 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 1b3e8b515e389b2b X-CR-MTA-TID: 64aa7808 Received: from 084bd5223863.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 8FE65D49-B4F9-4785-A7E9-FD2489F93DFF.1; Thu, 13 Apr 2023 14:30:00 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 084bd5223863.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 13 Apr 2023 14:30:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZgUmhTUI012iPuQV9D6IRANayhhscg1zNnEsHSsWCVdB0BcwRs5V2O8/Y8cBkcntVw4c1XxXRRltmoOQ03tYoiH2sJwZWmpRIvkdJENWwRim8SsfLblvKgJ3dIE3Fr8xXEx8fJlYgWWhWEmWQ1ryUoyhsyc+Se+K7smdbWsBdWioGiQg+uzpgjQ3uQjjfY/jUY6hg0b9BGEMsQwgBmUO7NuXGe6B1u77//iQzP4sxQxUHkkSL31N4bg5yWq9jkueDslUPc1Befej1WK4cenvWKEQvPCNQRGrNO/97H9Uyfy4vhQQahpESanEAG3iA8UKun7jli0GXl8+iLpaXoMShg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sDyVVBU5z/ofO6PMsCoqnx9mErNXGoszG/OoPCxZvnU=; b=g880bHShtXxQDFkh4E7PIbA20pUmoqfBEw6tsYCkpeKMeWDDFB1t/1HX/8Pa/WwIinFKDzhKVH5Nk9Adj4g4pYHcC1IjkahDNVVU/uGlbETjlf0dNbrL8AlX6l3ApBiZuJQ5cOiR4XEk7b9knHEJyUI7oGWkoc2pb2XMTT4ToOnBpK5AFze5M413tnMu4O1tIckv9hk6YY0BV0NTufPknfM9uAPsQcU1fW94fskTkYKNSN3XtFl5PJ7MsNXWVDhB/PhCEchN9a5uQ8N9W0HrCCgj+CjHwm7MdOrvo4rY1o2Y2WFbaGYu5fJqNOxVNm9+caqbnW18fCC8ethQaQcL/g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sDyVVBU5z/ofO6PMsCoqnx9mErNXGoszG/OoPCxZvnU=; b=6h+hIiqxvBbU85zV5c810xf+kFt3yAGbbIA0U6mLWzX/7i569QxoqgZ1Gn1rXX9M/utVgLjYM4WC4ABudsi6cDek5ZLHOrn0ny+hw+Pqa7F3iOEI6LbGj5tXCA9/PVuSEEQ00YrFxecaDD+vpghHvKpgiaMUTo6Ae9iLlv+yBP8= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB9870.eurprd08.prod.outlook.com (2603:10a6:20b:595::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.38; Thu, 13 Apr 2023 14:29:58 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f%9]) with mapi id 15.20.6277.036; Thu, 13 Apr 2023 14:29:58 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Adhemerval Zanella Subject: [PATCH] math: Improve fmod(f) performance Thread-Topic: [PATCH] math: Improve fmod(f) performance Thread-Index: AQHZbhJm07aDR1yyTkuekLFyXoMslw== Date: Thu, 13 Apr 2023 14:29:58 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB9870:EE_|DBAEUR03FT021:EE_|PAWPR08MB10118:EE_ X-MS-Office365-Filtering-Correlation-Id: 8b07b0a4-61a9-4b3e-192a-08db3c2b9411 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 58XdCqahcVsWY8aZu8ldgArxmssREypVE9qrwbEtHfpWQyOJdq3rYnBJGTKkXlMpHLdiMYRYrXr6qz3q4K/vgUfY8t2UiNWG0DiTC1zFo0vMi3VOuS9NtJuJMqbiwWbdpFHns93QyLvsS9adR3bfaQFbDjO7qPbpZbcf3RstJvrD1e6DKzYc6hjipcVcWWxXnZKS2zUSTF6V4WPDgJSB1Gznrlr6BqdWp7AHvCxLrFCGGJsZXjq1sPxdzQVkc9UY5ZlIwsIqwM5DITMdozjr+hSHJiDeV7LDF3/XLS1DHVZHmcMrgVq/SEgke7yS/NOGARrpA0hP1kdKLtMrG5jf17xDD7yZ7Yv2eOZZCMfoYX0Zh+JYCLIKs9JLvl9HPCUYcsfG7f0IjzWd/IvoL2K9IE+Ov5NM6JxGVXSjZnAKQAjcJYoE0Q8Yx3HVFiMRk33IngdxpEar+zqKhiUAyUdPwQsCtA4QFltzVblBCt7lM3aee5XjDg4qYbqAa77houw9ziVdO9qDZeDuM3HwQC1OQv0PSn1+DvQFy3iNWHPH9nOGDg4xd2ZBcygCIFL3gQY2FaNhWS//99dJZ6PlJXhcrdHJcB6u0ubJvMrWLxxCJx5v5S2b5VggK6VuimVvjVx9 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(366004)(376002)(136003)(346002)(396003)(39860400002)(451199021)(71200400001)(478600001)(7696005)(9686003)(26005)(66476007)(186003)(316002)(91956017)(2906002)(76116006)(66946007)(6506007)(4326008)(66556008)(52536014)(66446008)(41300700001)(6916009)(8936002)(8676002)(64756008)(122000001)(38100700002)(55016003)(5660300002)(83380400001)(33656002)(86362001)(38070700005);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9870 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a6d2758d-b2d2-4c35-91f1-08db3c2b8f13 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: MV2JT4ARl4UzdlMVn5cLdLqijX/HRWtHtJs+aNPu72Pz5SENOwA0lhvGQKPL58/FPp3sJS7Mw7Z1G0nSeYr10umBEUdUneAaDAsG75lDM9tDWkzOqMkZzY87SXXF50GxNBs9u2dePKjMzvBnhyjSAKCYOt0REaeMxkW9Vty+ljiWHqH3gi0VvJMPoXtBHWMuRPEvzMqKLypmlKJc/whY8woPjUuQmAQVNqHzgOp0RpjHt6Iap/PX2boJq1nA8ZCctJVbI52jWC1kmkIfhg0DSo0e/2scCzWu7AN3f13JN2z7QLg4pSsZyf2YcyLQDVGxoLLtbbmlhHbWq3SG8rCLwdYQY+4EpXLEEyHrmv2eznMoDhpKhAlL3hUj6lEJNieiMx4twho5HfuhKo/WMTeLfcMvOFW1hBqtlahiLg8X+lmzS5BxCrmUxJL+4YqPha0pSLaEfJ0J5l84UnI3pCPY2KWTpPTGa0Y7r9c66zYivkO2ASseEIoy3CKx6ZLZAv6o8D/CEpuyujKBzqqPcHiCoUVgncsF7Cc54jexM3/URIwU7Gj5ywoEucnyEBMagm0SmbBk5YEw/0CAsIXAN+J1WwhRZed8kXISlbNrAOlhd+wDnQcgBFS+yDr1zWsKMmWKgc6sL7xvanPA3fxIv0S5/mCE3to5d3so6iTsrZmfHKEKlptlA1dXZtFOCeTa+BF4x5xMONcls7STQhoUKVw/tQ== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(346002)(376002)(396003)(39860400002)(136003)(451199021)(40470700004)(36840700001)(46966006)(2906002)(86362001)(82740400003)(8936002)(356005)(81166007)(52536014)(5660300002)(40460700003)(40480700001)(55016003)(33656002)(82310400005)(336012)(83380400001)(47076005)(107886003)(9686003)(6506007)(26005)(186003)(478600001)(36860700001)(316002)(41300700001)(7696005)(6916009)(8676002)(70586007)(70206006)(4326008);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Apr 2023 14:30:06.8465 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8b07b0a4-61a9-4b3e-192a-08db3c2b9411 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB10118 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Optimize the fast paths (x < y) and (x/y < 2^12). Delay handling of specia= l cases to reduce=0A= amount of code executed before the fast paths. Performance of the close-ex= ponents benchmark=0A= improves by 18-19% on Cortex-A72 and Neoverse V1. Passes regress.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/ieee754/dbl-64/e_fmod.c b/sysdeps/ieee754/dbl-64/e_fmo= d.c=0A= index caae4e47e2358daced51a22342ec6e34a04f6fce..7c0ef11cb6c53561e64fab4f2ec= e17692dff3e5f 100644=0A= --- a/sysdeps/ieee754/dbl-64/e_fmod.c=0A= +++ b/sysdeps/ieee754/dbl-64/e_fmod.c=0A= @@ -40,10 +40,10 @@=0A= =0A= r =3D=3D x % y =3D=3D (x % (N * y)) % y=0A= =0A= - And with mx/my being mantissa of double floating point number (which us= es=0A= + And with mx/my being mantissa of a double floating point number (which = uses=0A= less bits than the storage type), on each step the argument reduction c= an=0A= be improved by 11 (which is the size of uint64_t minus MANTISSA_WIDTH p= lus=0A= - the signal bit):=0A= + the implicit one bit):=0A= =0A= mx * 2^ex =3D=3D 2^11 * mx * 2^(ex - 11)=0A= =0A= @@ -54,7 +54,12 @@=0A= mx << 11;=0A= ex -=3D 11;=0A= mx %=3D my;=0A= - } */=0A= + }=0A= +=0A= + Special cases:=0A= + - If x or y is a NaN, a NaN is returned.=0A= + - If x is an infinity, or y is zero, a NaN is returned and EDOM is se= t.=0A= + - If x is +0/-0, and y is not zero, +0/-0 is returned. */=0A= =0A= double=0A= __fmod (double x, double y)=0A= @@ -67,62 +72,70 @@ __fmod (double x, double y)=0A= hx ^=3D sx;=0A= hy &=3D ~SIGN_MASK;=0A= =0A= - /* Special cases:=0A= - - If x or y is a Nan, NaN is returned.=0A= - - If x is an inifinity, a NaN is returned and EDOM is set.=0A= - - If y is zero, Nan is returned.=0A= - - If x is +0/-0, and y is not zero, +0/-0 is returned. */=0A= - if (__glibc_unlikely (hy =3D=3D 0=0A= - || hx >=3D EXPONENT_MASK || hy > EXPONENT_MASK))=0A= - {=0A= - if (is_nan (hx) || is_nan (hy))=0A= - return (x * y) / (x * y);=0A= - return __math_edom ((x * y) / (x * y));=0A= - }=0A= -=0A= - if (__glibc_unlikely (hx <=3D hy))=0A= + /* If x < y, return x (unless y is a NaN). */=0A= + if (__glibc_likely (hx < hy))=0A= {=0A= - if (hx < hy)=0A= - return x;=0A= - return asdouble (sx);=0A= + /* If y is a NaN, return a NaN. */=0A= + if (__glibc_unlikely (hy > EXPONENT_MASK))=0A= + return x * y;=0A= + return x;=0A= }=0A= =0A= int ex =3D hx >> MANTISSA_WIDTH;=0A= int ey =3D hy >> MANTISSA_WIDTH;=0A= + int exp_diff =3D ex - ey;=0A= +=0A= + /* Common case where exponents are close: |x/y| < 2^12, x not inf/NaN=0A= + and |x%y| not denormal. */=0A= + if (__glibc_likely (ey < (EXPONENT_MASK >> MANTISSA_WIDTH) - EXPONENT_WI= DTH=0A= + && ey > MANTISSA_WIDTH=0A= + && exp_diff <=3D EXPONENT_WIDTH))=0A= + {=0A= + uint64_t mx =3D (hx << EXPONENT_WIDTH) | SIGN_MASK;=0A= + uint64_t my =3D (hy << EXPONENT_WIDTH) | SIGN_MASK;=0A= +=0A= + mx %=3D (my >> exp_diff);=0A= +=0A= + if (__glibc_unlikely (mx =3D=3D 0))=0A= + return asdouble (sx);=0A= + int shift =3D __builtin_clzll (mx);=0A= + ex -=3D shift + 1;=0A= + mx <<=3D shift;=0A= + mx =3D sx | (mx >> EXPONENT_WIDTH);=0A= + return asdouble (mx + ((uint64_t)ex << MANTISSA_WIDTH));=0A= + }=0A= =0A= - /* Common case where exponents are close: ey >=3D -907 and |x/y| < 2^52,= */=0A= - if (__glibc_likely (ey > MANTISSA_WIDTH && ex - ey <=3D EXPONENT_WIDTH))= =0A= + if (__glibc_unlikely (hy =3D=3D 0 || hx >=3D EXPONENT_MASK))=0A= {=0A= - uint64_t mx =3D (hx & MANTISSA_MASK) | (MANTISSA_MASK + 1);=0A= - uint64_t my =3D (hy & MANTISSA_MASK) | (MANTISSA_MASK + 1);=0A= + /* If x is a NaN, return a NaN. */=0A= + if (hx > EXPONENT_MASK)=0A= + return x * y;=0A= =0A= - uint64_t d =3D (ex =3D=3D ey) ? (mx - my) : (mx << (ex - ey)) % my;= =0A= - return make_double (d, ey - 1, sx);=0A= + /* If x is an infinity or y is zero, return a NaN and set EDOM. */= =0A= + return __math_edom ((x * y) / (x * y));=0A= }=0A= =0A= - /* Special case, both x and y are subnormal. */=0A= - if (__glibc_unlikely (ex =3D=3D 0 && ey =3D=3D 0))=0A= + /* Special case, both x and y are denormal. */=0A= + if (__glibc_unlikely (ex =3D=3D 0))=0A= return asdouble (sx | hx % hy);=0A= =0A= - /* Convert |x| and |y| to 'mx + 2^ex' and 'my + 2^ey'. Assume that hx i= s=0A= - not subnormal by conditions above. */=0A= + /* Extract normalized mantissas - hx is not denormal and hy !=3D 0. */= =0A= uint64_t mx =3D get_mantissa (hx) | (MANTISSA_MASK + 1);=0A= - ex--;=0A= uint64_t my =3D get_mantissa (hy) | (MANTISSA_MASK + 1);=0A= -=0A= int lead_zeros_my =3D EXPONENT_WIDTH;=0A= - if (__glibc_likely (ey > 0))=0A= - ey--;=0A= - else=0A= +=0A= + ey--;=0A= + /* Special case for denormal y. */=0A= + if (__glibc_unlikely (ey < 0))=0A= {=0A= my =3D hy;=0A= + ey =3D 0;=0A= + exp_diff--;=0A= lead_zeros_my =3D clz_uint64 (my);=0A= }=0A= =0A= - /* Assume hy !=3D 0 */=0A= int tail_zeros_my =3D ctz_uint64 (my);=0A= int sides_zeroes =3D lead_zeros_my + tail_zeros_my;=0A= - int exp_diff =3D ex - ey;=0A= =0A= int right_shift =3D exp_diff < tail_zeros_my ? exp_diff : tail_zeros_my;= =0A= my >>=3D right_shift;=0A= @@ -141,8 +154,7 @@ __fmod (double x, double y)=0A= if (exp_diff =3D=3D 0)=0A= return make_double (mx, ey, sx);=0A= =0A= - /* Assume modulo/divide operation is slow, so use multiplication with in= vert=0A= - values. */=0A= + /* Multiplication with the inverse is faster than repeated modulo. */= =0A= uint64_t inv_hy =3D UINT64_MAX / my;=0A= while (exp_diff > sides_zeroes) {=0A= exp_diff -=3D sides_zeroes;=0A= diff --git a/sysdeps/ieee754/flt-32/e_fmodf.c b/sysdeps/ieee754/flt-32/e_fm= odf.c=0A= index 763900efdaa8222431e189e50a7e62baf465a54d..14f3fcae256d03ceb4bf91898a7= a003bbfcb854a 100644=0A= --- a/sysdeps/ieee754/flt-32/e_fmodf.c=0A= +++ b/sysdeps/ieee754/flt-32/e_fmodf.c=0A= @@ -40,10 +40,10 @@=0A= =0A= r =3D=3D x % y =3D=3D (x % (N * y)) % y=0A= =0A= - And with mx/my being mantissa of double floating point number (which us= es=0A= + And with mx/my being mantissa of a single floating point number (which = uses=0A= less bits than the storage type), on each step the argument reduction c= an=0A= be improved by 8 (which is the size of uint32_t minus MANTISSA_WIDTH pl= us=0A= - the signal bit):=0A= + the implicit one bit):=0A= =0A= mx * 2^ex =3D=3D 2^8 * mx * 2^(ex - 8)=0A= =0A= @@ -54,7 +54,12 @@=0A= mx << 8;=0A= ex -=3D 8;=0A= mx %=3D my;=0A= - } */=0A= + }=0A= +=0A= + Special cases:=0A= + - If x or y is a NaN, a NaN is returned.=0A= + - If x is an infinity, or y is zero, a NaN is returned and EDOM is se= t.=0A= + - If x is +0/-0, and y is not zero, +0/-0 is returned. */=0A= =0A= float=0A= __fmodf (float x, float y)=0A= @@ -67,61 +72,69 @@ __fmodf (float x, float y)=0A= hx ^=3D sx;=0A= hy &=3D ~SIGN_MASK;=0A= =0A= - /* Special cases:=0A= - - If x or y is a Nan, NaN is returned.=0A= - - If x is an inifinity, a NaN is returned.=0A= - - If y is zero, Nan is returned.=0A= - - If x is +0/-0, and y is not zero, +0/-0 is returned. */=0A= - if (__glibc_unlikely (hy =3D=3D 0=0A= - || hx >=3D EXPONENT_MASK || hy > EXPONENT_MASK))=0A= - {=0A= - if (is_nan (hx) || is_nan (hy))=0A= - return (x * y) / (x * y);=0A= - return __math_edomf ((x * y) / (x * y));=0A= - }=0A= -=0A= - if (__glibc_unlikely (hx <=3D hy))=0A= + if (__glibc_likely (hx < hy))=0A= {=0A= - if (hx < hy)=0A= - return x;=0A= - return asfloat (sx);=0A= + /* If y is a NaN, return a NaN. */=0A= + if (__glibc_unlikely (hy > EXPONENT_MASK))=0A= + return x * y;=0A= + return x;=0A= }=0A= =0A= int ex =3D hx >> MANTISSA_WIDTH;=0A= int ey =3D hy >> MANTISSA_WIDTH;=0A= + int exp_diff =3D ex - ey;=0A= +=0A= + /* Common case where exponents are close: |x/y| < 2^9, x not inf/NaN=0A= + and |x%y| not denormal. */=0A= + if (__glibc_likely (ey < (EXPONENT_MASK >> MANTISSA_WIDTH) - EXPONENT_WI= DTH=0A= + && ey > MANTISSA_WIDTH=0A= + && exp_diff <=3D EXPONENT_WIDTH))=0A= + {=0A= + uint32_t mx =3D (hx << EXPONENT_WIDTH) | SIGN_MASK;=0A= + uint32_t my =3D (hy << EXPONENT_WIDTH) | SIGN_MASK;=0A= +=0A= + mx %=3D (my >> exp_diff);=0A= +=0A= + if (__glibc_unlikely (mx =3D=3D 0))=0A= + return asfloat (sx);=0A= + int shift =3D __builtin_clz (mx);=0A= + ex -=3D shift + 1;=0A= + mx <<=3D shift;=0A= + mx =3D sx | (mx >> EXPONENT_WIDTH);=0A= + return asfloat (mx + ((uint32_t)ex << MANTISSA_WIDTH));=0A= + }=0A= =0A= - /* Common case where exponents are close: ey >=3D -103 and |x/y| < 2^8, = */=0A= - if (__glibc_likely (ey > MANTISSA_WIDTH && ex - ey <=3D EXPONENT_WIDTH))= =0A= + if (__glibc_unlikely (hy =3D=3D 0 || hx >=3D EXPONENT_MASK))=0A= {=0A= - uint64_t mx =3D (hx & MANTISSA_MASK) | (MANTISSA_MASK + 1);=0A= - uint64_t my =3D (hy & MANTISSA_MASK) | (MANTISSA_MASK + 1);=0A= + /* If x is a NaN, return a NaN. */=0A= + if (hx > EXPONENT_MASK)=0A= + return x * y;=0A= =0A= - uint32_t d =3D (ex =3D=3D ey) ? (mx - my) : (mx << (ex - ey)) % my;= =0A= - return make_float (d, ey - 1, sx);=0A= + /* If x is an infinity or y is zero, return a NaN and set EDOM. */= =0A= + return __math_edomf ((x * y) / (x * y));=0A= }=0A= =0A= - /* Special case, both x and y are subnormal. */=0A= - if (__glibc_unlikely (ex =3D=3D 0 && ey =3D=3D 0))=0A= + /* Special case, both x and y are denormal. */=0A= + if (__glibc_unlikely (ex =3D=3D 0))=0A= return asfloat (sx | hx % hy);=0A= =0A= - /* Convert |x| and |y| to 'mx + 2^ex' and 'my + 2^ey'. Assume that hx i= s=0A= - not subnormal by conditions above. */=0A= + /* Extract normalized mantissas - hx is not denormal and hy !=3D 0. */= =0A= uint32_t mx =3D get_mantissa (hx) | (MANTISSA_MASK + 1);=0A= - ex--;=0A= -=0A= uint32_t my =3D get_mantissa (hy) | (MANTISSA_MASK + 1);=0A= int lead_zeros_my =3D EXPONENT_WIDTH;=0A= - if (__glibc_likely (ey > 0))=0A= - ey--;=0A= - else=0A= +=0A= + ey--;=0A= + /* Special case for denormal y. */=0A= + if (__glibc_unlikely (ey < 0))=0A= {=0A= my =3D hy;=0A= + ey =3D 0;=0A= + exp_diff--;=0A= lead_zeros_my =3D __builtin_clz (my);=0A= }=0A= =0A= int tail_zeros_my =3D __builtin_ctz (my);=0A= int sides_zeroes =3D lead_zeros_my + tail_zeros_my;=0A= - int exp_diff =3D ex - ey;=0A= =0A= int right_shift =3D exp_diff < tail_zeros_my ? exp_diff : tail_zeros_my;= =0A= my >>=3D right_shift;=0A= @@ -140,8 +153,7 @@ __fmodf (float x, float y)=0A= if (exp_diff =3D=3D 0)=0A= return make_float (mx, ey, sx);=0A= =0A= - /* Assume modulo/divide operation is slow, so use multiplication with in= vert=0A= - values. */=0A= + /* Multiplication with the inverse is faster than repeated modulo. */= =0A= uint32_t inv_hy =3D UINT32_MAX / my;=0A= while (exp_diff > sides_zeroes) {=0A= exp_diff -=3D sides_zeroes;=0A=