From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2051.outbound.protection.outlook.com [40.107.8.51]) by sourceware.org (Postfix) with ESMTPS id C007B3875DD5 for ; Thu, 5 Oct 2023 16:00:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C007B3875DD5 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gKbg9/HlbtpEkYTNdcSW5q5SMdgbojXFLxBYcpC1y38=; b=GdsEkKFHnHjZ1HD988OnoiMHyIHhIsbbTr3JJlxosOJhe6/7HUlRtziMBZM5NHHBSfTswHAlBgMpPxflsnpmlCC1znEQ+5dCrWMTT68rFayqVYQ2X8m4vvn7NGwAyNQxbTvby3PCV6I2AidG9SffCFtlyd9y2a6Eh52tqOedal0= Received: from AM6PR04CA0064.eurprd04.prod.outlook.com (2603:10a6:20b:f0::41) by DBBPR08MB5898.eurprd08.prod.outlook.com (2603:10a6:10:20c::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.35; Thu, 5 Oct 2023 16:00:49 +0000 Received: from AM7EUR03FT004.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:f0:cafe::a5) by AM6PR04CA0064.outlook.office365.com (2603:10a6:20b:f0::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.24 via Frontend Transport; Thu, 5 Oct 2023 16:00:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT004.mail.protection.outlook.com (100.127.140.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.29 via Frontend Transport; Thu, 5 Oct 2023 16:00:49 +0000 Received: ("Tessian outbound 0ae75d4034ba:v211"); Thu, 05 Oct 2023 16:00:48 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6e34ad66be10d006 X-CR-MTA-TID: 64aa7808 Received: from 59e55d147a94.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 3C9A7317-953F-45AD-892F-E0FC481C3F15.1; Thu, 05 Oct 2023 16:00:42 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 59e55d147a94.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 05 Oct 2023 16:00:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HW/SAUO/x4jwsgisd5sPETXSg0MYxSXW2kXYkP8tucF47FZqMydZwcT2J1ZVyoR2qdqMWtgFQlhSelMqkgJU5+kKAdljcO3TzasPKWQBZo+eY53zc+hWweCrrm3dhs4X5zx5v8JdveH3b3HAhugWVZZRaARiBrj8KY1SprLC9AFoGbCbvtV0YYj6ZQs4BqGcy3CDnOfN8+N8aHGjKU6qZF04OhVNKbhtgveObZHrUGi+fCGVBawV8umfGrI/RjISbY7b7NAZX6+h5JxR3tHU0ZXfMwwXavxkzyNsamHREchmz3Rn9k4dGdwwOaF6mzrwRgbrt0Q42pSZtJKCdXWNvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gKbg9/HlbtpEkYTNdcSW5q5SMdgbojXFLxBYcpC1y38=; b=GX6ViVzDHbrFs5jVE9i8KkwJdI3FKFmy7x+LWcpu+ha5VaIukQHS/H4BIQ+F3Y5bVQdvHqFnxUKjVle4YH8tunvyIPDZ/BxaQuBm5X5rAMLugmoJzOXTCvs/wOIO4AWlHQOTrM00lrzTuaD7sTIqaUmlg6zHTUAveL8CWhpgFzuqDxyIhwnHxyYmGLdPiqW8Q1HNrKZv7ID0MR9C2dJ891H0qZ+mka3FUQr5inmCq23cr7NNUR8gZM2tZ7YrtnGJAHbUUbAlehSWMofoXaSwK8jWiHk3sKg9BGeTpB5eMQBhhaN7PDK6Yf4D1fEoL0n0EtGP4NlqHVOol0Te2D4sHg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gKbg9/HlbtpEkYTNdcSW5q5SMdgbojXFLxBYcpC1y38=; b=GdsEkKFHnHjZ1HD988OnoiMHyIHhIsbbTr3JJlxosOJhe6/7HUlRtziMBZM5NHHBSfTswHAlBgMpPxflsnpmlCC1znEQ+5dCrWMTT68rFayqVYQ2X8m4vvn7NGwAyNQxbTvby3PCV6I2AidG9SffCFtlyd9y2a6Eh52tqOedal0= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by DU0PR08MB7740.eurprd08.prod.outlook.com (2603:10a6:10:3bc::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.30; Thu, 5 Oct 2023 16:00:40 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::e62:8b0f:9b88:39a1]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::e62:8b0f:9b88:39a1%4]) with mapi id 15.20.6838.033; Thu, 5 Oct 2023 16:00:40 +0000 Date: Thu, 5 Oct 2023 17:00:22 +0100 From: Szabolcs Nagy To: Joe Ramsay , Subject: Re: [PATCH] aarch64: Improve vecmath sin routines Message-ID: References: <20231004105809.50464-1-Joe.Ramsay@arm.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231004105809.50464-1-Joe.Ramsay@arm.com> X-ClientProxiedBy: DS7PR03CA0311.namprd03.prod.outlook.com (2603:10b6:8:2b::23) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|DU0PR08MB7740:EE_|AM7EUR03FT004:EE_|DBBPR08MB5898:EE_ X-MS-Office365-Filtering-Correlation-Id: b783ca4f-fa96-4be0-9cf8-08dbc5bc3e3c x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: l20CjwmkHeI/XQK8RH0vAMIcvHAQegxKvfNQk8zAJiZ4NCl3jyW8q6JA+3NciWikGaSX0AdveG/j5HMOPWoUp5JcvGfPQJ9WZ/t5QW+LMV65b4EflZAVz8Dp69lOxMkJrem2CMMH8v+64yZlva15/7vRcKtiRTk2AINT6JANdN8HnBALmXrAMi4Ful/TWIdqFlm6Uz/yumk+2YqjURZwx2obzdW+ICRFnLBXtJrytpIs/xkTtsXVYQSu0mcjv6puLgwIHqS94TfwskVWiwBWXitPK9QxBYiXKP40W6jZH+7Xgf9oQpB8Jbytry6DEC5/VDZIHihB/V5N2ZpiFm56hG3ZJcFxecLKl/+GHM884iDiwVIjs3tGabt3P8hTgIo1Yo/cfZUXsGCn2aucyNdofv8E90RQayYjYi7fZPS5iThN+Wdvt3eA3gSboFZp1OmqjVY7EDHTrbignUQ/iy897KTnh5h4HVQnOYqK2zvNvwwRoDUxHDJPHqggDZUsF+g5/AvCJ2c9t9fk05zZAew45LmcbvYfSPMy4BbWJCHng5zZ8KGNP4alZ6q3LqgOPkxcNaDC0ZaUBpFG2dZJjiDumlVMNzS09TwXAU2S2eMjrEtOaQo+gnS2Pi+ns8vBkYvq X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(346002)(39860400002)(376002)(396003)(136003)(366004)(230922051799003)(64100799003)(186009)(1800799009)(451199024)(66476007)(2906002)(6512007)(66556008)(6506007)(38100700002)(86362001)(478600001)(6486002)(2616005)(83380400001)(26005)(66946007)(316002)(44832011)(36756003)(5660300002)(110136005)(41300700001)(8936002)(8676002)(6666004)(473944003);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7740 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT004.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 9a6b4a2a-e2c3-465e-028f-08dbc5bc38da X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: RfcyMDr7JJ5yDNv2WzqjGbtV/dDP4Iakt8PNOB4aj11NJZooqbrlWwRuxvpwFoFGUaFXeuXjDHLLqMpvpHUmn/56z7ggT5oMYQgWuJHruz5vgnYoj+srRwtRvAL5gbpy7qwH4+v3BtPrZeECVoCN0ulF9fxMol5CTibEdC2HYpMEr9Za3YriqcTPZIXKZ5zU1vVnIvkx3a2g8d61F8/m1WzjJ+TgTPbhK2D/McCo2+L4eq2TQELrcz1QnHPK1Av9sdKHE0OrVXrBKRZg/WF5RVqly5o50ZAEmm5rW16JA/UKE5sQCf54jYbsKH3JP9nmXJAyBT75++A1NybqwZfArQl507z1jgXLvkcwpzJph3S9t29xKfQ9SM+UERn5omUW+RUHe1Q+6IF7bU69ulNvxWxuHMfIONg35uVYdXB4NoYao6m/8VteZwufLtZ81VbS/9LX2sQd164GbtulraAWNFr6GPVZwFDMKXpZuRUxPDjHUI2qoNMJ0dbtubZUGi/iWyUtu8TnXuHQJ+x35f5/PnfYBKIopXylru1QF3vomZMpN3g+YeW2BqPeZUUuBCLP5D/mivNZUQGA7qwQF9Z+FqeZz8v+tFTx/EZcW/YK6Zhziif4BWDjECdoWeG93OLF2QxFsCNxHkK/JtfPNnZ+erBMV7EArUC2NziqExuVUWDiYuewrLxRJcjDnvCfyB9AmybpRUD7fqUOllRWKmYXjeQdIA5wgCi8AphV9/zRZxq/Fo7kHyjsouRzyzxUDBGlaN0xJtCqfsW8y5ohwqrT0Q== X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(396003)(376002)(39860400002)(230922051799003)(64100799003)(451199024)(186009)(82310400011)(1800799009)(40470700004)(46966006)(36840700001)(40460700003)(47076005)(36860700001)(82740400003)(356005)(81166007)(70206006)(40480700001)(41300700001)(8676002)(110136005)(70586007)(2906002)(8936002)(316002)(5660300002)(44832011)(26005)(83380400001)(6666004)(336012)(478600001)(6486002)(6506007)(6512007)(2616005)(36756003)(86362001)(473944003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Oct 2023 16:00:49.0959 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b783ca4f-fa96-4be0-9cf8-08dbc5bc3e3c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT004.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB5898 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 10/04/2023 11:58, Joe Ramsay wrote: > * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] > * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions > * Improve register use near special-case branch > > Also use overloaded intrinsics for SVE. looks good. committed. > --- > Subsumes a patch from August which replaced FTRIG instructions with polynomial. > Thanks, > Joe > sysdeps/aarch64/fpu/sin_advsimd.c | 2 +- > sysdeps/aarch64/fpu/sin_sve.c | 102 +++++++++++++++--------------- > sysdeps/aarch64/fpu/sinf_sve.c | 44 +++++++------ > 3 files changed, 75 insertions(+), 73 deletions(-) > > diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c > index 0389b334cc..55644c4cc6 100644 > --- a/sysdeps/aarch64/fpu/sin_advsimd.c > +++ b/sysdeps/aarch64/fpu/sin_advsimd.c > @@ -24,7 +24,7 @@ static const struct data > float64x2_t poly[7]; > float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; > } data = { > - /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ > + /* Worst-case error is 2.87 ulp in [-pi/2, pi/2]. */ > .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), > V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), > V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), > diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c > index c3f450d0ea..9e7f5ff684 100644 > --- a/sysdeps/aarch64/fpu/sin_sve.c > +++ b/sysdeps/aarch64/fpu/sin_sve.c > @@ -21,20 +21,23 @@ > > static const struct data > { > - double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, > - shift; > + double inv_pi, pi_1, pi_2, pi_3, shift, range_val; > + double poly[7]; > } data = { > - /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ > + /* Worst-case error is 2.87 ulp in [-pi/2, pi/2]. */ > + .poly = { -0x1.555555555547bp-3, 0x1.1111111108a4dp-7, -0x1.a01a019936f27p-13, > + 0x1.71de37a97d93ep-19, -0x1.ae633919987c6p-26, > + 0x1.60e277ae07cecp-33, -0x1.9e9540300a1p-41, }, > + > .inv_pi = 0x1.45f306dc9c883p-2, > - .half_pi = 0x1.921fb54442d18p+0, > - .inv_pi_over_2 = 0x1.45f306dc9c882p-1, > - .pi_over_2_1 = 0x1.921fb50000000p+0, > - .pi_over_2_2 = 0x1.110b460000000p-26, > - .pi_over_2_3 = 0x1.1a62633145c07p-54, > - .shift = 0x1.8p52 > + .pi_1 = 0x1.921fb54442d18p+1, > + .pi_2 = 0x1.1a62633145c06p-53, > + .pi_3 = 0x1.c1cd129024e09p-106, > + .shift = 0x1.8p52, > + .range_val = 0x1p23, > }; > > -#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ > +#define C(i) sv_f64 (d->poly[i]) > > static svfloat64_t NOINLINE > special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) > @@ -42,56 +45,53 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) > return sv_call_f64 (sin, x, y, cmp); > } > > -/* A fast SVE implementation of sin based on trigonometric > - instructions (FTMAD, FTSSEL, FTSMUL). > - Maximum observed error in 2.52 ULP: > - SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 > - want 0x1.10ace8f3e7868p-40. */ > +/* A fast SVE implementation of sin. > + Maximum observed error in 3.22 ULP: > + _ZGVsMxv_sin (0x1.d70eef40f39b1p+12) got -0x1.ffe9537d5dbb7p-3 > + want -0x1.ffe9537d5dbb4p-3. */ > svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) > { > const struct data *d = ptr_barrier (&data); > > - svfloat64_t r = svabs_f64_x (pg, x); > - svuint64_t sign > - = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); > - svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); > + /* Load some values in quad-word chunks to minimise memory access. */ > + const svbool_t ptrue = svptrue_b64 (); > + svfloat64_t shift = sv_f64 (d->shift); > + svfloat64_t inv_pi_and_pi1 = svld1rq (ptrue, &d->inv_pi); > + svfloat64_t pi2_and_pi3 = svld1rq (ptrue, &d->pi_2); > > - /* Load first two pio2-related constants to one vector. */ > - svfloat64_t invpio2_and_pio2_1 > - = svld1rq_f64 (svptrue_b64 (), &d->inv_pi_over_2); > + /* n = rint(|x|/pi). */ > + svfloat64_t n = svmla_lane (shift, x, inv_pi_and_pi1, 0); > + svuint64_t odd = svlsl_x (pg, svreinterpret_u64 (n), 63); > + n = svsub_x (pg, n, shift); > > - /* n = rint(|x|/(pi/2)). */ > - svfloat64_t q = svmla_lane_f64 (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); > - svfloat64_t n = svsub_n_f64_x (pg, q, d->shift); > + /* r = |x| - n*(pi/2) (range reduction into -pi/2 .. pi/2). */ > + svfloat64_t r = x; > + r = svmls_lane (r, n, inv_pi_and_pi1, 1); > + r = svmls_lane (r, n, pi2_and_pi3, 0); > + r = svmls_lane (r, n, pi2_and_pi3, 1); > > - /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ > - r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); > - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_2); > - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_3); > + /* sin(r) poly approx. */ > + svfloat64_t r2 = svmul_x (pg, r, r); > + svfloat64_t r3 = svmul_x (pg, r2, r); > + svfloat64_t r4 = svmul_x (pg, r2, r2); > > - /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ > - svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); > + svfloat64_t t1 = svmla_x (pg, C (4), C (5), r2); > + svfloat64_t t2 = svmla_x (pg, C (2), C (3), r2); > + svfloat64_t t3 = svmla_x (pg, C (0), C (1), r2); > > - /* sin(r) poly approx. */ > - svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); > - svfloat64_t y = sv_f64 (0.0); > - y = svtmad_f64 (y, r2, 7); > - y = svtmad_f64 (y, r2, 6); > - y = svtmad_f64 (y, r2, 5); > - y = svtmad_f64 (y, r2, 4); > - y = svtmad_f64 (y, r2, 3); > - y = svtmad_f64 (y, r2, 2); > - y = svtmad_f64 (y, r2, 1); > - y = svtmad_f64 (y, r2, 0); > - > - /* Apply factor. */ > - y = svmul_f64_x (pg, f, y); > - > - /* sign = y^sign. */ > - y = svreinterpret_f64_u64 ( > - sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); > + svfloat64_t y = svmla_x (pg, t1, C (6), r4); > + y = svmla_x (pg, t2, y, r4); > + y = svmla_x (pg, t3, y, r4); > + y = svmla_x (pg, r, y, r3); > > + svbool_t cmp = svacle (pg, x, d->range_val); > + cmp = svnot_z (pg, cmp); > if (__glibc_unlikely (svptest_any (pg, cmp))) > - return special_case (x, y, cmp); > - return y; > + return special_case (x, > + svreinterpret_f64 (sveor_z ( > + svnot_z (pg, cmp), svreinterpret_u64 (y), odd)), > + cmp); > + > + /* Copy sign. */ > + return svreinterpret_f64 (sveor_z (pg, svreinterpret_u64 (y), odd)); > } > diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c > index 4d2ce7a846..590881c14b 100644 > --- a/sysdeps/aarch64/fpu/sinf_sve.c > +++ b/sysdeps/aarch64/fpu/sinf_sve.c > @@ -23,7 +23,7 @@ static const struct data > { > float poly[4]; > /* Pi-related values to be loaded as one quad-word and used with > - svmla_lane_f32. */ > + svmla_lane. */ > float negpi1, negpi2, negpi3, invpi; > float shift; > } data = { > @@ -57,40 +57,42 @@ svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) > { > const struct data *d = ptr_barrier (&data); > > - svfloat32_t ax = svabs_f32_x (pg, x); > - svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), > - svreinterpret_u32_f32 (ax)); > - svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); > + svfloat32_t ax = svabs_x (pg, x); > + svuint32_t sign > + = sveor_x (pg, svreinterpret_u32 (x), svreinterpret_u32 (ax)); > + svbool_t cmp = svcmpge (pg, svreinterpret_u32 (ax), RangeVal); > > /* pi_vals are a quad-word of helper values - the first 3 elements contain > -pi in extended precision, the last contains 1 / pi. */ > - svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &d->negpi1); > + svfloat32_t pi_vals = svld1rq (svptrue_b32 (), &d->negpi1); > > /* n = rint(|x|/pi). */ > - svfloat32_t n = svmla_lane_f32 (sv_f32 (d->shift), ax, pi_vals, 3); > - svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); > - n = svsub_n_f32_x (pg, n, d->shift); > + svfloat32_t n = svmla_lane (sv_f32 (d->shift), ax, pi_vals, 3); > + svuint32_t odd = svlsl_x (pg, svreinterpret_u32 (n), 31); > + n = svsub_x (pg, n, d->shift); > > /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ > svfloat32_t r; > - r = svmla_lane_f32 (ax, n, pi_vals, 0); > - r = svmla_lane_f32 (r, n, pi_vals, 1); > - r = svmla_lane_f32 (r, n, pi_vals, 2); > + r = svmla_lane (ax, n, pi_vals, 0); > + r = svmla_lane (r, n, pi_vals, 1); > + r = svmla_lane (r, n, pi_vals, 2); > > /* sin(r) approx using a degree 9 polynomial from the Taylor series > expansion. Note that only the odd terms of this are non-zero. */ > - svfloat32_t r2 = svmul_f32_x (pg, r, r); > + svfloat32_t r2 = svmul_x (pg, r, r); > svfloat32_t y; > - y = svmla_f32_x (pg, C (2), r2, C (3)); > - y = svmla_f32_x (pg, C (1), r2, y); > - y = svmla_f32_x (pg, C (0), r2, y); > - y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); > + y = svmla_x (pg, C (2), r2, C (3)); > + y = svmla_x (pg, C (1), r2, y); > + y = svmla_x (pg, C (0), r2, y); > + y = svmla_x (pg, r, r, svmul_x (pg, y, r2)); > > /* sign = y^sign^odd. */ > - y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), > - sveor_u32_x (pg, sign, odd))); > + sign = sveor_x (pg, sign, odd); > > if (__glibc_unlikely (svptest_any (pg, cmp))) > - return special_case (x, y, cmp); > - return y; > + return special_case (x, > + svreinterpret_f32 (sveor_x ( > + svnot_z (pg, cmp), svreinterpret_u32 (y), sign)), > + cmp); > + return svreinterpret_f32 (sveor_x (pg, svreinterpret_u32 (y), sign)); > } > -- > 2.27.0 >