From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2055.outbound.protection.outlook.com [40.107.22.55]) by sourceware.org (Postfix) with ESMTPS id A8D54388CC12 for ; Wed, 18 Aug 2021 12:51:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A8D54388CC12 Received: from DB6PR07CA0186.eurprd07.prod.outlook.com (2603:10a6:6:42::16) by VI1PR08MB3039.eurprd08.prod.outlook.com (2603:10a6:803:3d::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.19; Wed, 18 Aug 2021 12:51:39 +0000 Received: from DB5EUR03FT034.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:42:cafe::6c) by DB6PR07CA0186.outlook.office365.com (2603:10a6:6:42::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.9 via Frontend Transport; Wed, 18 Aug 2021 12:51:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT034.mail.protection.outlook.com (10.152.20.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.19 via Frontend Transport; Wed, 18 Aug 2021 12:51:38 +0000 Received: ("Tessian outbound 8b41f5fb4e9e:v103"); Wed, 18 Aug 2021 12:51:38 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4c3eaa419621facf X-CR-MTA-TID: 64aa7808 Received: from 17a717b4da0a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 614BA83F-BBDF-42CE-935E-FD7778C987E0.1; Wed, 18 Aug 2021 12:51:29 +0000 Received: from EUR03-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 17a717b4da0a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 18 Aug 2021 12:51:29 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=n4t64kW3RIjudWiYLFAvBUS1iaWtQa3eqa8cW+FJ36N9UC0FDW7775+/mGUT0RHeKn7b8QFliWV5sBXMiBZGPaqJYztSBx6Yc2O5zdo+KHKRP/h/L16fbiBZVMNUawg+L+UMc2l8m5gWeeGoH8cneW36MrRMNRZ3md62INTf+TTxL9eoxrbFyGmAm2/GC06ErCLp5ofRggck59ECrNtY1qr4NVN8Pt7rQDc1Dd+0pXQeoYobBmqsdsac6FiKKbr5JzA++Vwadc+xv9+7O5TzzcW90r/VC36n5miaPzzCZeM2XkIBKiuticI9afsMgNRd8eUbDEwvxUhRbst7bGsUzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lRJMJjUOTWJ515j5HTlNATQ0sLheRVrMkoTIxLNyOuA=; b=J4MqMrrMxTlwzS2FfhhzVkgMrpqbrkTS89BvrPxrZ5G6tn52LXNTodoeS6IZJE5cUWDeg/RDFvNJwZbUbsOrqH6NXnMfZDB42EizGRIdRObgs0shvQTd7D/YLN3/Ag3smGk8fyMxKG81q3L9WATrreFS1PMzbLUFka1CtZyHFMAXOM6JNTkS67xdBZIs6il0ZQkS6fWAwojILLIHY9em126nnolMXhsgIPX0hypRWDgvGCxrrsOU6AVTl8+gty3kMmoah/NoBZQejN0oeeJ9J0MpvZaUKKycOB2lAttmdKRElzNPj+aMWgwt/9FwT3hbuE1yTNgXLA5fn+q+UN63iA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: nexgo.de; dkim=none (message not signed) header.d=none;nexgo.de; dmarc=none action=none header.from=arm.com; Received: from PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) by PA4PR08MB5917.eurprd08.prod.outlook.com (2603:10a6:102:f1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.19; Wed, 18 Aug 2021 12:51:21 +0000 Received: from PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6]) by PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6%6]) with mapi id 15.20.4436.019; Wed, 18 Aug 2021 12:51:21 +0000 Date: Wed, 18 Aug 2021 13:51:19 +0100 From: Szabolcs Nagy To: Stefan Kanthak Cc: libc-help@sourceware.org Subject: Re: nextafter() about an order of magnitude slower than trivial implementation Message-ID: <20210818125119.GH25257@arm.com> References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO2P265CA0356.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:d::32) To PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.55) by LO2P265CA0356.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:d::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.19 via Frontend Transport; Wed, 18 Aug 2021 12:51:21 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 2e7b3597-2605-43fb-71c9-08d96246eb87 X-MS-TrafficTypeDiagnostic: PA4PR08MB5917:|VI1PR08MB3039: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:4941;OLM:4941; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9Fm5em4YhXBnzPFT1+jryAf9fErkcEKxM57U2i+eGPpKAisb/CR3zvE9mWo+0mVYE68yMnb8ECAGgPWUvDPwE3PwHWaNrkaTURNtId42gOH2hH0Z99qF4QHZexqUAW8gDigQKwWlf7Pf22hLtjiWp+a4enROCJ1x0WaWLpc3mwKh0vODVoP24b52BNB5TKKmSdDSk4zHE8SBJ8vOxEaFkVGphQ/ckSEo7uaCZcY/TA/4hTRzt1h1k8iG5jAkGwdzR3E4ZHbuVAzmzTUCjM55OO7iO062OAeRygHbjbBuqwGdKcbLIotCaINJDMExyUx9+JzesDMnsLMBXWnPltN1a4zw7WVrNmY4g2YyxU4Psb1mRmdHe2meMx/ZOoHp7KN7HSfI0jQBySN/vvw3Qdol1AER5GP4y2VblwcYjp/3hVwauCQPdjwe30tfxbVQjiThkluukRfMgXSZsBVVcc2Er1hwnpx2fFp/Vw3754izTQKi/c1hJp1hRg0z2G5JjMpdbh4gexNrVIwUkF+EQuDqNB0WjoemkQcyYxrWZhMriH3koDc6ohIUaacfS6MxqD/3j/JZwDud4hujbwtQ5Rat3x5t9AYBzruXvnuYozpMOEbiR6PGUt7yRL1hYEeC+boIAEGz70r5VxDJsRpkL0N4LJuoV8IjSaJpHr0XP7ZyRC2WSWzPpI7/LnHs4Xh3MlnpveGV2mJtCWeR3G18j4rgFQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PA4PR08MB6320.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(366004)(396003)(136003)(376002)(39860400002)(8886007)(55016002)(4326008)(52116002)(5660300002)(7696005)(186003)(66556008)(66476007)(26005)(316002)(36756003)(1076003)(66946007)(478600001)(6916009)(8936002)(8676002)(33656002)(86362001)(2616005)(956004)(38350700002)(38100700002)(44832011)(2906002)(83380400001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SU16V2IwZC82T0gwK2s0Q0ZHZ2NxUjBKWTlBM0pXQVluMlNaYmNDbllkakFo?= =?utf-8?B?ZzhDcEVXbFltdDVBT3U4aEkxdm5XTTgxWlQ0RU9YZ1Z6bmUyQ3B2Mjl0Ynd6?= =?utf-8?B?eU9OakNZTlFHYm9RU1dQb0Y4OGs1aGxBK1NOVFRkaFVzTnYydSsxQVgzSXNl?= =?utf-8?B?L1B4S1M4c2FwSmlHbVdDc3QvOTZ0RjlNNHlQcXpjODAwaWUra0FHOTkrczdk?= =?utf-8?B?eEtPSURmb1RiWkxITWd2WGFudDVlbXYzamg4ak9nRnhtUmF2MWFVNFVNR1Z2?= =?utf-8?B?bU9pNnJLODZjUmpaeHNOejk2ZGNaTmkrT2ZOdmxDZXc2T1d6bmNybmx5eUM0?= =?utf-8?B?Y0N5R3pxZ20xNVpnN3ZyWG9kZFUyckFtUkZ2VzNLWEZya2VmNzZMZHQ1Rkpx?= =?utf-8?B?YVRHM2V1RzJCR1VxSWxTT3BVczMvcmM3SVMrYXlrVUQxRUkrWjUrenE0dEgv?= =?utf-8?B?RTFGZlZmVjhUWFdCMDdIMGtGVmllNVFoL0NSVnVFbldmd3k5enhZWWM4dlpY?= =?utf-8?B?dGhwL3FWUjJ2WWRCNURWdWNsaDlNbC9ZQ3BWVGFCbXh0eDRNZmdrS0dlNGs3?= =?utf-8?B?eTZpRWl0cUxzSkliM3BoZmpTbG1wYXFoVWtCcFRERTZCbDNXY2NuVVlTVjVt?= =?utf-8?B?VGZiNHpDMXhCRU9aNnJIb2dUM1BEZlZ6M2I3WnpPY0toWFFoZ2t6YmZ4clo5?= =?utf-8?B?Z0FsbHRaYkgwaWZZaXl5bnpNQVRtbFlkTFFBQkNueDdyaDdiU1hCQ3FpYS9w?= =?utf-8?B?cmxaTlhRTUpGbk1OL1NnRUNBNEkrelIwTDA4SDZzZ2x2RVFGZkNSUzM1eUtV?= =?utf-8?B?UklWSGdWdDliOGlwc1BEK3FRZFZLdlo0aC9BSmg1QTVSdG0zMGZxTWRrN21x?= =?utf-8?B?b0tXNUdqTTVBV0NWbXdyMC96aDUvbjEzR2JFbGMwZWJ3ejBxaTdyaG4ybE5W?= =?utf-8?B?NUFIUVJqdC9oWUxCcWkxUDlKcGc1TjlWejFDU1UzK0pmSzhwOWhsdDZZT2lp?= =?utf-8?B?TjhlVXRXVFJQSzN1Y25pRExDbG53Sk5ZWFVXbWNoSEoxNVVCUUF3c24zQnJl?= =?utf-8?B?UzBlVHBZaWt6VU82a3R4dnJRVGpKVU13NjBGYXhRZE9PK3NTaEdlOEtmYVE4?= =?utf-8?B?dzNZVzJpSDRiU0VBUGFNVkIvZERsYnVQMk1uNjdjM1RBZXdhVzJvSW1UenE0?= =?utf-8?B?OXdWTG5oNzk5enh1YlROdDRnYmpXNUlSMHUreVozWW1Cd0xNbnBtTDRDandI?= =?utf-8?B?OW1xbDZEVW5UVy9BcnZhcDlZcHdOOFNja0UycW8yWllyTXluSDZkVS9qSEo0?= =?utf-8?B?VkE5dldqT2swT04rWU9KQWN0SGdWUnlla01MTVNCMGN4M0R3MmN5ek9adFg1?= =?utf-8?B?dzBNV0ZDVml5WW9RQms3dzFhSkNnUk5WVnpEZEFsNWRpNFhvNUtIbDFIZEp5?= =?utf-8?B?VUMwRDNWVGJGYncyTm5lSWdWVUJjbGowTTRlMU91c2plY0ZGVXlWdFlVYVhm?= =?utf-8?B?TEFSZVkvZ0JlcndoVUhKQkswL1NJRDBRcVc5akFnbnhrVmpLeFh5TEM1dzE5?= =?utf-8?B?NzM1NnVqRjZRajkzYlJodEp3UnNtUnVFK1kzc253eU91a2JrbnFHSDRZTDZE?= =?utf-8?B?QUpwVzNmNlBYekthNVovUkRRV3JDTmRZeVFYZkY4aEMvcGJsNzh2RTduVW1S?= =?utf-8?B?QSs5ckRWajZBVWh3anA0Kzg5V0VDN1BCZzFrMGZ4d3IwUFZtOXREZ00vMjU4?= =?utf-8?Q?o8q5JnyhUzOfuoYn75rAQ6kKjKwd88mIKdkukh0?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR08MB5917 Original-Authentication-Results: nexgo.de; dkim=none (message not signed) header.d=none;nexgo.de; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: a4a6dd0d-1a96-41b9-18f9-08d96246e0f3 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: u0JCNkFYhmItuu16yGq0yugYGCiwN+A//QRcHpTuFYvKqaEMVxeXWMQigWnyj8Of9AjRP+W/N5yTRnsRH1H/bR6vygVelhSAMUKH2HT3eWF+Pblqxlj0TecixCPLhhLMdRfDVpnr038TWuoJAacKQVTu8rdlgE6B1BtUsgIPyl+JgoG2aaaK/jS6klSd1BY1/kB/wIUgPRjn5Dw9pqGRAuNkXpDHEzKxMvnevzSskHsoTTzqn5AvGhhR0Dk6zh+JSdIKbSp4KIxj5V1WRpG6WZosjDBjNH9XVy0fJeM8lo41GGxNl0x7sPCPbW96uBvA/kYy4HqSYmkYPruT8/42ethnWBXOk6dA733pjcDBCnruBMq4EPMZqwEc2YJuLQJZLgdrTKabJnl3jw92dFFzCjzGBOa0hrUPYjL77sPSDwrnLsH3dYEUjMsV0b+WWroiSCZx/4GKkWQwFZc+baZ6qywJ87NfmFdtsAeKIV7RYTyGE5umJwotvIDdUfZpNOhOTvaHpGZXO56wQ6tqoIm6+zM40v8Ocwq4JIPZCr0WwcthVGPtx+4w9IWfr53c3D5CbRJIc3JvPNlNaQ6DjXHBXenP9JlDppOS048jA8ARFUzDLpq9yPNxYeLWqfmj4MrS9K+hcf4OVdIXFt+vlzoFGehg1Og6HKoLzXYHM38xMXVTXqy7DJ5JQLyerBx7rkT88ooKuuhvYtMnskbjhp8n5w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(396003)(376002)(39860400002)(346002)(36840700001)(46966006)(1076003)(70586007)(186003)(81166007)(8676002)(8936002)(36756003)(4326008)(356005)(26005)(316002)(82740400003)(2906002)(70206006)(478600001)(8886007)(6862004)(5660300002)(2616005)(956004)(336012)(36860700001)(47076005)(44832011)(33656002)(83380400001)(86362001)(55016002)(7696005)(82310400003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Aug 2021 12:51:38.8330 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2e7b3597-2605-43fb-71c9-08d96246eb87 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT034.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB3039 X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2021 12:51:53 -0000 The 08/16/2021 18:03, Stefan Kanthak wrote: > Testing and benchmarking an exponential function that consumes > about 10ns/call on an AMD EPYC 7262, I noticed that nextafter() > itself is DEAD SLOW: it consumes about 8.5ns/call! > > The following (quick&dirty) implementation consumes 0.7ns/call, > i.e. is about an order of magnitude faster: correctly measuring latency on a modern x86_64 core: musl: 3.16 ns glibc: 5.68 ns your: 5.72 ns i.e. your code is slower. (this is the correctly predicted hot path latency measuring for(i=0;i > --- after.c --- > double nextafter(double from, double to) > { > if (from == to) > return to; > > if (to != to) > return to; > > if (from != from) > return from; > > if (from == 0.0) > return to < 0.0 ? -0x1.0p-1074 : 0x1.0p-1074; > > unsigned long long ull = *(unsigned long long *) &from; > > if ((from < to) == (from < 0.0)) > ull--; > else > ull++; > > return 0.0 + *(double *) &ull; > } > --- EOF --- > > JFTR: the code generated by GCC for this function is FAR from > optimal; the assembly implementation shown below is more > than 2x faster and consumes <0.3ns/call, i.e. is about > 30x faster than glibc's nextafter()! > > The data from 'perf stat' show that glibc's nextafter() executes > about 30 instructions more than the above trivial implementation, > and that it is NOT well suited for modern super-scalar processors. > > Stefan > > PS: test program and logs > > [stefan@rome ~]$ cat bench.c > // Copyright (C) 2005-2021, Stefan Kanthak > > #include > #include > #include > #include > > __attribute__((noinline)) > double nop(double foo, double bar) > { > return foo + bar; > } > > inline static > double lfsr64(void) > { > // 64-bit linear feedback shift register (Galois form) using > // primitive polynomial 0xAD93D23594C935A9 (CRC-64 "Jones"), > // initialised with 2**64 / golden ratio > > static uint64_t lfsr = 0x9E3779B97F4A7C15; > const uint64_t sign = (int64_t) lfsr >> 63; > > lfsr = (lfsr << 1) ^ (0xAD93D23594C935A9 & sign); > > return *(double *) &lfsr; > } > > inline static > double random64(void) > { > static uint64_t seed = 0x0123456789ABCDEF; > > seed = seed * 6364136223846793005 + 1442695040888963407; > > return *(double *) &seed; > } > > int main(void) > { > clock_t t0, t1, t2, tt; > uint32_t n; > volatile double result; > > t0 = clock(); > > for (n = 500000000u; n > 0u; n--) { > result = nop(lfsr64(), 0.0); > result = nop(random64(), 1.0 / 0.0); > } > > t1 = clock(); > > for (n = 500000000u; n > 0u; n--) { > result = nextafter(lfsr64(), 0.0); > result = nextafter(random64(), 1.0 / 0.0); > } > > t2 = clock(); > tt = t2 - t0; t2 -= t1; t1 -= t0; t0 = t2 - t1; > > printf("\n" > "nop() %4lu.%06lu 0\n" > "nextafter() %4lu.%06lu %4lu.%06lu\n" > " %4lu.%06lu nano-seconds\n", > t1 / CLOCKS_PER_SEC, (t1 % CLOCKS_PER_SEC) * 1000000u / CLOCKS_PER_SEC, > t2 / CLOCKS_PER_SEC, (t2 % CLOCKS_PER_SEC) * 1000000u / CLOCKS_PER_SEC, > t0 / CLOCKS_PER_SEC, (t0 % CLOCKS_PER_SEC) * 1000000u / CLOCKS_PER_SEC, > tt / CLOCKS_PER_SEC, (tt % CLOCKS_PER_SEC) * 1000000u / CLOCKS_PER_SEC); > } > [stefan@rome ~]$ gcc -O3 -lm bench.c > [stefan@rome ~]$ perf stat ./a.out > > nop() 1.480000 0 > nextafter() 10.060000 8.580000 > 11.540000 nano-seconds > > Performance counter stats for './a.out': > > 11,548.78 msec task-clock:u # 1.000 CPUs utilized > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 145 page-faults:u # 0.013 K/sec > 38,917,213,536 cycles:u # 3.370 GHz (83.33%) > ~~~~~~~~~~~~~~ > 15,647,656,615 stalled-cycles-frontend:u # 40.21% frontend cycles idle (83.33%) > 10,746,044,422 stalled-cycles-backend:u # 27.61% backend cycles idle (83.33%) > 69,739,403,870 instructions:u # 1.79 insn per cycle > ~~~~~~~~~~~~~~ ~~~~ > # 0.22 stalled cycles per insn (83.33%) > 16,495,748,110 branches:u # 1428.354 M/sec (83.33%) > 500,701,246 branch-misses:u # 3.04% of all branches (83.33%) > > 11.550414029 seconds time elapsed > > 11.548265000 seconds user > 0.000999000 seconds sys > > > [stefan@rome ~]$ gcc -O3 bench.c after.c > [stefan@rome ~]$ perf stat ./a.out > > nop() 1.490000 0 > nextafter() 2.210000 0.720000 > 3.700000 nano-seconds > > Performance counter stats for './a.out': > > 3,702.89 msec task-clock:u # 1.000 CPUs utilized > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 122 page-faults:u # 0.033 K/sec > 12,407,345,183 cycles:u # 3.351 GHz (83.32%) > ~~~~~~~~~~~~~~ > 135,817 stalled-cycles-frontend:u # 0.00% frontend cycles idle (83.34%) > 5,498,895,906 stalled-cycles-backend:u # 44.32% backend cycles idle (83.34%) > 38,002,430,460 instructions:u # 3.06 insn per cycle > ~~~~~~~~~~~~~~ ~~~~ > # 0.14 stalled cycles per insn (83.34%) > 7,497,381,393 branches:u # 2024.735 M/sec (83.34%) > 497,462 branch-misses:u # 0.01% of all branches (83.32%) > > 3.703648875 seconds time elapsed > > 3.703294000 seconds user > 0.000000000 seconds sys > > > [stefan@rome ~]cat after.s > # Copyright � 2004-2021, Stefan Kanthak > > .arch generic64 > .code64 > .intel_syntax noprefix > .text > # xmm0 = from > # xmm1 = to > nextafter: > xorpd xmm2, xmm2 # xmm2 = 0.0 > ucomisd xmm1, xmm2 # CF = (to < 0.0) > jp .Lto # to = NAN? > sbb rax, rax # rax = (to < 0.0) ? -1 : 0 > ucomisd xmm0, xmm1 # CF = (from < to) > jp .Lfrom # from = NAN? > je .Lto # from = to? > .Lnotequal: > sbb rcx, rcx # rcx = (from < to) ? -1 : 0 > ucomisd xmm0, xmm2 # CF = (from < 0.0) > jz .Lzero # from = 0.0? > .Lnext: > movq rdx, xmm0 # rdx = from > sbb rax, rax # rax = (from < 0.0) ? -1 : 0 > xor rax, rcx # rax = (from < 0.0) = (from < to) ? 0 : -1 > or rax, 1 # rax = (from < 0.0) = (from < to) ? 1 : -1 > sub rdx, rax > movq xmm0, rdx > addsd xmm0, xmm2 > ret > .Lzero: > shl rax, 63 # rax = (to < -0.0) ? 0x8000000000000000 : 0 > or rax, 1 # rax = (to < -0.0) ? 0x8000000000000001 : 1 > movq xmm0, rax # xmm0 = (to < -0.0) ? -0x1.0p-1074 : 0x1.0p-1074 > addsd xmm0, xmm2 > ret > .Lto: > movsd xmm0, xmm1 # xmm0 = to > .Lfrom: > ret > > .size nextafter, .-nextafter > .type nextafter, @function > .global nextafter > .end > [stefan@rome ~]$ perf stat ./a.out > > nop() 1.630000 0 > nextafter() 1.910000 0.280000 > 3.540000 nano-seconds > > Performance counter stats for './a.out': > > 3,547.12 msec task-clock:u # 1.000 CPUs utilized > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 123 page-faults:u # 0.035 K/sec > 11,949,840,797 cycles:u # 3.369 GHz (83.32%) > ~~~~~~~~~~~~~~ > 129,627 stalled-cycles-frontend:u # 0.00% frontend cycles idle (83.34%) > 3,998,960,716 stalled-cycles-backend:u # 33.46% backend cycles idle (83.34%) > 37,493,523,285 instructions:u # 3.14 insn per cycle > ~~~~~~~~~~~~~~ ~~~~ > # 0.11 stalled cycles per insn (83.34%) > 7,998,559,192 branches:u # 2254.945 M/sec (83.34%) > 497,565 branch-misses:u # 0.01% of all branches (83.32%) > > 3.547999008 seconds time elapsed > > 3.546671000 seconds user > 0.000999000 seconds sys > > > [stefan@rome ~]$ > --