From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on2053.outbound.protection.outlook.com [40.107.14.53]) by sourceware.org (Postfix) with ESMTPS id 69E863851ABF for ; Fri, 17 Mar 2023 14:56:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 69E863851ABF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/XCgSEohGiKtAvETQclDw29YLQesC1uS+SMX6hUbJw8=; b=bBx279EVqm4q1yHJC/m61OAlbWoF8m45dM2boH9vzElEsT4ZD3Mvs+NU5w8z7KXwHXO7Qat00jpvXkTEut5U9i3PbNIJ52MIqW0v9bA9IyMaApaxRCWLDHkGvj+FtGsuj2w95/a+MgZoxnbmGOM1LN3d2Vlvpi0xD5E9bv9AC9U= Received: from AS9PR06CA0584.eurprd06.prod.outlook.com (2603:10a6:20b:486::9) by DU0PR08MB9679.eurprd08.prod.outlook.com (2603:10a6:10:445::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.31; Fri, 17 Mar 2023 14:55:51 +0000 Received: from AM7EUR03FT062.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:486:cafe::2d) by AS9PR06CA0584.outlook.office365.com (2603:10a6:20b:486::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.35 via Frontend Transport; Fri, 17 Mar 2023 14:55:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT062.mail.protection.outlook.com (100.127.140.99) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6199.17 via Frontend Transport; Fri, 17 Mar 2023 14:55:51 +0000 Received: ("Tessian outbound cfb430c87a1e:v135"); Fri, 17 Mar 2023 14:55:51 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a60d571a3ade40d7 X-CR-MTA-TID: 64aa7808 Received: from 9bc981ed2d67.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D332087F-7F02-4649-81F7-730237A41542.1; Fri, 17 Mar 2023 14:55:43 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9bc981ed2d67.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 17 Mar 2023 14:55:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eaTrmcrKOCPZpWRf4nivqaSw5tHdg6TI94qA1PRtJ9BYR8kzd2TJ8bXoB47CN+Pzf7VXzYnikIjSLQyrcLH01+K373HApwHHpf85cA7Mfy7ltnArtojzoPcpc9nQ6fGaPoMeBX/MtHhLf0HOqQp8nH2b5mkP6yuGUR5NEG4KVIHt4m1GCyDUx9/F7Bx/ucQeyqXRNPxLxFdMUl9Nsx3jiLA02sA0/N0XTLIowiuEjOLPFjtE2/mu2JQeD+PrHS1v4BxJ6IQhvC7zOiE/Tf/R8lhiiV18OsXDzF6EieclpD7XXDlNEaRS02JpvpKZuOcIOuBDE/9HyfYBSkFz+OdY2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/XCgSEohGiKtAvETQclDw29YLQesC1uS+SMX6hUbJw8=; b=RDBoscAZEyJMHgORDi9c27NDtixFr9/bkK7L9SughZnXV2g5ZDxGdmw4kSGfpEkG2MGlHyO8e3xk3abHWtWmDv5gan1+bD41n/ehG/0LnTjxETjZjeRGiRYuChPKXKQAi2BtfqJP1Zow7zVAOf03kbOpWRtZrr3j5FoKF8dl6M/RwM/j3dNe7qFz/eEvT9k82aAjp2A1REAbwUi1OdgdW6e71giUS8MkHI1/41WICNGvvcJ1BU3Sz7/4JtZUfb7LcTY/znmPIYb9703B5AFqdpi58kZwySNWke5v6YzLUPFbVF7rgtbNjptgyvWPjP9h0mNDI1gvOzGDYrowoW1weA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/XCgSEohGiKtAvETQclDw29YLQesC1uS+SMX6hUbJw8=; b=bBx279EVqm4q1yHJC/m61OAlbWoF8m45dM2boH9vzElEsT4ZD3Mvs+NU5w8z7KXwHXO7Qat00jpvXkTEut5U9i3PbNIJ52MIqW0v9bA9IyMaApaxRCWLDHkGvj+FtGsuj2w95/a+MgZoxnbmGOM1LN3d2Vlvpi0xD5E9bv9AC9U= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB9920.eurprd08.prod.outlook.com (2603:10a6:20b:55a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.33; Fri, 17 Mar 2023 14:55:40 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573%7]) with mapi id 15.20.6178.030; Fri, 17 Mar 2023 14:55:40 +0000 From: Wilco Dijkstra To: Adhemerval Zanella Netto , "H.J. Lu" CC: "libc-alpha@sourceware.org" , kirill Subject: Re: [PATCH v2 3/5] math: Improve fmod Thread-Topic: [PATCH v2 3/5] math: Improve fmod Thread-Index: AQHZV4EMzhGqWZw8wU+m559tgWlHQ678ldYAgADiRgCAAAgvoIAAX3kAgAEj0yM= Date: Fri, 17 Mar 2023 14:55:40 +0000 Message-ID: References: <20230315205910.4120377-1-adhemerval.zanella@linaro.org> <20230315205910.4120377-4-adhemerval.zanella@linaro.org> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB9920:EE_|AM7EUR03FT062:EE_|DU0PR08MB9679:EE_ X-MS-Office365-Filtering-Correlation-Id: 92c7eba8-9580-472c-8a5f-08db26f7b37c x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: HWYMEl2EkxpS/52J3m6bRoKOq+EL87YTPYyGn5nBttrtg1yY0OmWx4WyWOCCjgOTdwLa6eD1aiQSBtK7JYlFg7QUNv/nxv+GBNr8vd9mTKfGBd/0qYSUE4YPC12+Dhkl0WVU3o81O+xxopmDswrFqGwTGS/mZHjOP7LvfiQnKImsD/UMKeaIdKHLKpP2/v5cr63JDD3vC0QtmcY7siG18msL6SZgvf7CMtgki2xqCFrOQDEuin3ASz/UAJ1GgJBAGESNiBUN+lovH0B0wuxHFJIqeGB7u6WUofCVBGnJGfqHRoWh1O7QmYkKKkDejA7pMKzOgdJTSqFra3Fkpi5DcocqxD8sGhEO65EOmwDlFDkyIDKmpIuuGMUgX5+YxF+gDZdGUFKFv874aNaJ8jLfoAAuGbtQbNATMcrLumZClyH/NcORUMkCW01tie5+HNqP+fXZokbc20l5SaCnRuygCq+tsDEBqABttEGW6s3yGZKo8/g1k/QVVQGs0ZWybdo7PeKeRs24SKtjLEqk55KucPZ4l56vEVa+bvIZm7Dx33Yc5LjnpTiOfHxeX23nXOMJSc0OitCv8gswNBQtbk4DrubpnOp0xPcYJIS0qKu76EiGXiRW8fHNFYH13cge1D4EdikwtZpfPeBfmBVegTnn3rd2W0yRzK/8shtsKzLwr1XPETJPy1+XK7LWYMkYOiFbWmTJyO0iRcYGUnxalz2oNQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(346002)(136003)(39860400002)(366004)(376002)(396003)(451199018)(54906003)(33656002)(110136005)(316002)(38070700005)(55016003)(478600001)(83380400001)(41300700001)(2906002)(122000001)(5660300002)(52536014)(38100700002)(8936002)(91956017)(76116006)(66556008)(8676002)(66946007)(66446008)(64756008)(66476007)(4326008)(6506007)(26005)(9686003)(186003)(7696005)(71200400001)(86362001);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9920 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT062.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 3f6d8112-9afc-479f-3ad1-08db26f7acd7 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 36kl/J9IJFFpaK4JzdaxdBvNyFZMwAVuH4upEHWmdqujXICVbbqZLeTSLTP0o1gdo8s07U0cHkoEoxfjKOg3gc9ngfPAOLxKTNdEFCOstWGA0G1wObZF3vZLTOPTp5+/smgZCMjNfgtzU2w0z3k4nja6H0eQ+Bkj6HTzm2PhY48hR6d907Jxh/csGH+MRLKEEgX8Zg1qeUCDONs1oR1GMnRz6SfSH5KCetJCsBh97c9MTgLkAS69CNkEeGR0Np4IkTwsFGSaYhzS0HFZGxsg3l5S+Rngjzui6ZjZ6QhFwDW9IYCfZizr8QI5a9oR8htaITtvXL5kFjM6FnI8BQ07vaKNssSC+g8+D/f6rVtYSyR5wlYM7q7SMKErStH5aYVftChxqBYcLhI6IW5X6bQ3N+yYELiiaXztXbDhqdG0+//SE+MuduXNPBVXeIhH+EErjIAo+inzRlO4n35J595UmZHiRpNIMOmJlgT13zzdFHfrT3lF2ZFkqtYRJiKhA+jjn0JuZig4Oo41Df7TXXpNefr9iwNV6mGyW8ZBhCRhQ1ZSGB8X/9HB5y97hR1MqxaFwN5xnZGYU2OfOlA23Ivkk+6RJChJB0YGHA3h18gATO+8htOisPxoJqNIbe3QQUQfYRXgLjtK/lWUjb6KPtgoZLV46NjSB+IYD95bzwU3ith1bH+NdMuiHnzAxKOU9B6+RESm4m4JQHfAlFY+PATzAw== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230025)(4636009)(346002)(376002)(39860400002)(136003)(396003)(451199018)(46966006)(40470700004)(36840700001)(82740400003)(36860700001)(8936002)(8676002)(52536014)(70206006)(70586007)(4326008)(41300700001)(55016003)(40480700001)(356005)(82310400005)(86362001)(33656002)(5660300002)(40460700003)(81166007)(2906002)(107886003)(336012)(47076005)(7696005)(9686003)(186003)(6506007)(26005)(478600001)(110136005)(316002)(54906003)(83380400001);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Mar 2023 14:55:51.2493 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 92c7eba8-9580-472c-8a5f-08db26f7b37c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT062.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9679 X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Adhemerval,=0A= =0A= >> It's these cases where x87 is still faster than the generic version:=0A= >> =0A= >>> E5-2640=A0=A0=A0=A0=A0=A0=A0=A0=A0 | close-exponents | 39.298=A0=A0 | 2= 2.2742=0A= >>>=0A= >>> i7-4510U=A0=A0=A0=A0=A0=A0=A0=A0 | close-exponents | 29.463=A0=A0 | 22.= 8572=0A= >> =0A= >> Are these mostly x < y or cases where the exponent difference is just ov= er 11 and=0A= >> thus we do not use the fast path?=0A= >=0A= > In fact the fast path will be used on ~83% of the cases (849 from 1024 en= tries).=0A= > Profiling shows that the initial checks might be the culprit, since gener= ic =0A= > compat wrapper uses compiler builtins that might map to fp instructions. = But even=0A= > trying to mimic did not improve much.=A0 It seems that for some CPU the i= nteger=0A= > operations to create the final floating number is what is costly.=0A= =0A= If it is mostly the fast path we could further tune it and reduce instructi= on counts.=0A= It takes 6 if statements to enter this fast path, we could reduce that to 3= . There are=0A= several large constants which could be simplified (older x86 cores might ha= ve=0A= issues with multiple 10-byte MOVABS in the instruction stream).=0A= =0A= Also I think your results for generic above use the wrapper, so we'd still = get the=0A= > 20% speedup which should make things closer.=0A= =0A= Cheers,=0A= Wilco=