From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2045.outbound.protection.outlook.com [40.107.22.45]) by sourceware.org (Postfix) with ESMTPS id BE50A3858D35 for ; Thu, 13 Apr 2023 20:45:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BE50A3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z3V9kej0OEhwmcB80G1L4M2hSraGwfc+mCVtedeuqjI=; b=5Ga7yGeRSZJBUhGHIqkN9XvUJy4uwPDUJkhq4M9suE4SVT2OK6opPGg0iZ16PBTrpkHlD6nlqpjc++yEJVKjRjkoHqs21uyRocOlVyIzVoVcxebxJk2SD1dbWKNtYoECgUE3y+Cva3LT4lzlKvQj7xTTOov06lUZJ5zP7r/syDc= Received: from DB7PR02CA0012.eurprd02.prod.outlook.com (2603:10a6:10:52::25) by AS8PR08MB9905.eurprd08.prod.outlook.com (2603:10a6:20b:565::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.38; Thu, 13 Apr 2023 20:45:50 +0000 Received: from DBAEUR03FT056.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:52:cafe::e8) by DB7PR02CA0012.outlook.office365.com (2603:10a6:10:52::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.33 via Frontend Transport; Thu, 13 Apr 2023 20:45:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT056.mail.protection.outlook.com (100.127.142.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.30 via Frontend Transport; Thu, 13 Apr 2023 20:45:49 +0000 Received: ("Tessian outbound 8b05220b4215:v136"); Thu, 13 Apr 2023 20:45:49 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 34d85650de00c9b5 X-CR-MTA-TID: 64aa7808 Received: from 798daf5a9003.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 08134E5D-219D-4709-A5C9-5521AF2D49AE.1; Thu, 13 Apr 2023 20:45:42 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 798daf5a9003.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 13 Apr 2023 20:45:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TGLQDVa3IxiCStNfjugWjS5DOG0WhJXF7x/qYugdT1z5LxscpnWFXMeoogbrdb9CimkPBE18kZ8RwLz89Sx7VvdP0ZYZCY04HSkODt/ieqriYUxvJP3vqka6dGIuGYo66MJbcoD0JPAM58JTodEPR8ELKwqAFpat+mnj38IKIn2Y/SDYUkUpd6DJ7LCH98DRwOpoXZrTYVYYeGoL3f+gvCyKSE9PsAkhvADL+sUUSq2biew2PLax/4cvjqG4o7QMRtJ9t/DwbP8Xx1pROq8JDPdIWDQKYjAJjUOpAXR9jVfbxcRys6pYLILuc0qLAx4wxrzJa4PM+CEpAhBOoh6iSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=z3V9kej0OEhwmcB80G1L4M2hSraGwfc+mCVtedeuqjI=; b=NrdWa9YPvoqniy81O9uUDiaUecjxgWAbkz0IkTt4G7rfjYTXbjqq+Pw0uqeLXLb3ecT68a9MAe7eyT9U1yr+QQZxFLYoU+Oxpfto1qS0HjqsHEbmjsEWikIqxy3gt/YXmvnCT+AuKKh1AgyEa33NB6ZL2eqfFDQz6boRolxiaHETLfnXIcGuXeWskGFl3TYxmH60vqYI2/BDaE/JEWdpZLloJmnGMFxsgCHRvsA1eGh/i/2LiDr3yQqvgqcZsuk4ZY2XbpXNgw4nGLaRzweNCJrPBfI/FldwkweV+e8/9vo4Y9ELQtYPnmASMBcmct1Up4+xYDiXGzFoo1jW/X8WHA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=z3V9kej0OEhwmcB80G1L4M2hSraGwfc+mCVtedeuqjI=; b=5Ga7yGeRSZJBUhGHIqkN9XvUJy4uwPDUJkhq4M9suE4SVT2OK6opPGg0iZ16PBTrpkHlD6nlqpjc++yEJVKjRjkoHqs21uyRocOlVyIzVoVcxebxJk2SD1dbWKNtYoECgUE3y+Cva3LT4lzlKvQj7xTTOov06lUZJ5zP7r/syDc= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AM8PR08MB5619.eurprd08.prod.outlook.com (2603:10a6:20b:1d8::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.30; Thu, 13 Apr 2023 20:45:41 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f%9]) with mapi id 15.20.6277.036; Thu, 13 Apr 2023 20:45:40 +0000 From: Wilco Dijkstra To: Adhemerval Zanella Netto , 'GNU C Library' Subject: Re: [PATCH] math: Improve fmod(f) performance Thread-Topic: [PATCH] math: Improve fmod(f) performance Thread-Index: AQHZbhJm07aDR1yyTkuekLFyXoMsl68pZWOAgAAFvuA= Date: Thu, 13 Apr 2023 20:45:38 +0000 Message-ID: References: <0baece75-8f99-da08-4094-18f99238cb12@linaro.org> In-Reply-To: <0baece75-8f99-da08-4094-18f99238cb12@linaro.org> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AM8PR08MB5619:EE_|DBAEUR03FT056:EE_|AS8PR08MB9905:EE_ X-MS-Office365-Filtering-Correlation-Id: ff2c3d7a-1f72-40bd-99ec-08db3c6010c4 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: E9OsQviW6mUTE7CxhvOPA6PrCzmpreduXzCmp2rlGEwXUxcaKxTQJ4klxx0DtkzGXvtz0mYtw94CfCKlZ4m3VOy62WlnyEcMfy290gTuY4tLkvezligDd/z7yryEc2WQTuwvY5js1VuVR6LPkd+M3N3oaNrSkO1V2gTC1DZqrNAxacpPgj25ESVsHeADD6VumEyoLFJ997C6IGQOk1COYGKjTDL9yRuPFyhgObjdV3pWbCJxEJ/I8C5Oq5UaBtrL/FfzmqFSnaVZUJb8+T998MayfYcGdpv+Dwket6ePJx8k1llc5jZJ6tNxjvwWCGLpqmC1F+yDhsyN3fe02GKsW3Ix1XYCZ+iQQLN4DYY5Ve8+WQdfXwI/0ymavP/UJygFxGvxHq3phGJ8c+XOxsdHzCNgC/a80Qmo0o8RR74cDnyJBdU+1pAptOJT066QnPRqy7fzEXoUzmnvmHJszwgAYFCnFxqQNGCNpdXRIXGRRWducfD380H920Z1ohASg+DvY51C/h6DELILCytxip10zXr9810Ej6/43IloNBwY8B5inV/+kib7kG72Tbw/hnY5wqNdAOS7SAStTO3LnyelYxevPApSquH9zVnR+OIBOLzS1MFwCcEw5dOiSLtV+EXl X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(346002)(39860400002)(396003)(376002)(136003)(366004)(451199021)(316002)(110136005)(91956017)(41300700001)(76116006)(64756008)(66556008)(66946007)(66476007)(66446008)(186003)(6506007)(26005)(66899021)(71200400001)(9686003)(478600001)(7696005)(55016003)(2906002)(8936002)(8676002)(52536014)(5660300002)(83380400001)(122000001)(38100700002)(38070700005)(86362001)(33656002);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB5619 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT056.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: db1d288b-61a9-46dc-52ef-08db3c600a38 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IOHgMKzjnU7+wQpRcZpVPIV/oYSCoYuORH8ANdXKLvvnaWf+8LHqUcMOvSbEUWIjk2xTfgYlqYMig7irvmI0vnt363w02FOCBkH9kuR8oNLs5fj8risHgADR1KlEcwHFOovaqs+wkYZAR0zrRuUWlQEnTBS5NW+xuNSXEN9OjzHr6hMGXAk5RSCRif/PtKsnNtmoUbNV4Y+n5U1MpoxHWHCkWbUqVq4Zx8Rc4lIbfmIy8+Ns/I6CgzN/juq8UujOLFcYAwWz1o9bATOky6C0AWvWy6fYK1DIbLVsrlNF/+Nq+LpyjmrgnCYmH50/WbVPMCkk+/5xmQbozCE0X4uzVPAE2u8BvLLKrKXfnruAlBNbiTGlfOtRPuFua0YApiGTcvqE0jabPQPeoc55kiHHwSJSiunNETqX3wOR+SFaJHJleWt9CLAjEzj7Ob1Mhcr+LWIA+qHFXy/DJd0Ccox8zAfpFWMotR117TP0TU0yeTd1xjvUv2A4WeMvvD4z9N0JJlpZerUQIK88X+qWM8Oz+CxOGO1g/qUHv9DZDpuNO4I619GpFop/FLehAam6OpTJtoSlBKW6Cx93kYUZUldMLg8NA6rAdHJzdQj9pVZ157ZpdFyb8xd7A+u70tdJepDiOzHTQX3OzU1pY7PMfldMkYA52uq31V46iB07CmzSuNXwpKtgbTl6cURJ67Zkow8X7BwuraP4MzC++pdOpRsgow== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230028)(4636009)(346002)(376002)(39860400002)(396003)(136003)(451199021)(40470700004)(36840700001)(46966006)(7696005)(41300700001)(66899021)(110136005)(478600001)(81166007)(26005)(82740400003)(33656002)(36860700001)(356005)(6506007)(40460700003)(9686003)(82310400005)(186003)(336012)(8936002)(86362001)(52536014)(83380400001)(47076005)(70586007)(2906002)(40480700001)(55016003)(5660300002)(70206006)(8676002)(316002);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Apr 2023 20:45:49.8904 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ff2c3d7a-1f72-40bd-99ec-08db3c6010c4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT056.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9905 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Adhemerval,=0A= =0A= > So at least with current 'close-exponents' from bench-fmod, which was=0A= > generated from exponents between -10 and 10, the gain is more modest=0A= > (and normal inputs does show a small regression).=A0 This should be ok, = =0A= > but I also think we need to outline that A72 gains might not show on=0A= > different hardware.=0A= =0A= On a SkyLake I'm seeing this for fmod:=0A= =0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0master=A0=A0=A0patch= =0A= subnormals=A0=A0=A0=A0=A0=A0=A0=A0 51.34=A0=A0=A0 45.92 (+11.8%)=0A= normal=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 436.9=A0=A0=A0 420.5 (+3.9%)=0A= close-exponents=A0=A0=A0 56.44=A0=A0=A0 53.11 (+6.3%)=0A= =0A= And on Zen2:=0A= =0A= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0master=A0=A0=A0patch= =0A= subnormals=A0=A0=A0=A0=A0=A0=A0=A0 10.83=A0=A0=A0 10.39 (+4.2%)=0A= normal=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 336.1=A0=A0=A0 335.8 (+0.01%)=0A= close-exponents=A0=A0=A0 14.90=A0=A0=A0 14.11 (+5.6%)=0A= =0A= So it shows good improvements across the board. It's odd your results on AM= D are=0A= worse than my Zen 2 results - are there large variations between runs? I di= d quite a=0A= few runs to get a fast result and increased iterations of the math benchmar= ks 10x.=0A= =0A= I can't explain why the gains on AArch64 are so much larger - the reduced i= nstruction=0A= counts and branches for the common cases seem to make a big difference. On = x86=0A= there are still many MOVABS instructions which are problematic for decode.= =0A= =0A= > So maybe also add another bench-fmod set for |x/y| < 2^12 to show=0A= > the potential gains.=0A= =0A= I'm not sure how that would improve things - ideally we need more realistic= =0A= inputs (ie. actual traces) but we could change the existing inputs into wor= kloads=0A= to give it a more difficult problem. Changing close-exponents into a worklo= ad=0A= shows 11.0% lower latency and 11.9% better throughput on my SkyLake. On Zen= 2=0A= I see 1% lower latency and 7.4% better throughput. Neoverse V1 shows 25.1%= =0A= lower latency and 23.9% better throughput.=0A= =0A= Cheers,=0A= Wilco=0A=