From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on2059.outbound.protection.outlook.com [40.107.14.59]) by sourceware.org (Postfix) with ESMTPS id E69673888C4A for ; Thu, 5 Oct 2023 16:01:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E69673888C4A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IXYnNuxD4kA4l6WAifMbGFf/lShJajnJCW2RxxeR2hI=; b=77BAJVXn+QCBx7pxstRSH9lhv8fFFW9v0rOIW5/PahgTp6KWvdQrkjCiIR4kIAtaGDu/lGUTzGrUk51sufIDKngXbgV6EszXlR1DYbE64J2Zmk15yStk1xGA+MTq71rJiyCVB8NjB2PWUl7SN+15HpbZMyuTItyL8Ev9Jw6oy0Y= Received: from AM6P192CA0044.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:82::21) by VI1PR08MB10241.eurprd08.prod.outlook.com (2603:10a6:800:1be::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6813.28; Thu, 5 Oct 2023 16:01:33 +0000 Received: from AM7EUR03FT016.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:82:cafe::8a) by AM6P192CA0044.outlook.office365.com (2603:10a6:209:82::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.26 via Frontend Transport; Thu, 5 Oct 2023 16:01:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT016.mail.protection.outlook.com (100.127.140.106) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6863.26 via Frontend Transport; Thu, 5 Oct 2023 16:01:33 +0000 Received: ("Tessian outbound fdf44c93bd44:v211"); Thu, 05 Oct 2023 16:01:33 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 42421dcdc42632aa X-CR-MTA-TID: 64aa7808 Received: from bea32dd919d3.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B088E431-2365-4A07-9A1F-105FFDE3CA21.1; Thu, 05 Oct 2023 16:01:22 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id bea32dd919d3.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 05 Oct 2023 16:01:22 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iKv0xA3RwlIfxxsicfjU+6P6IgFxabxRk/i/S6ubDds7sBD/t8dz2bmcqLtmzVIZ53gxhh4u3DG8zuUUUuF9hLM4lOwTBcxXUaASIQ8I6VREh9NBs7GYSSD0aztjkRsO63dRGFRK+1hTmsaa6dmN/sgefZVmJL9Y7l8B9FFRLFWqYOmimZAGTFYacuWsaCgHyUvRi95ee7+giQK8nquoaof4DAYEuSNyhQ8Ez7pGE7nDDZv7G/jPI+x58uTmmrO4wajhp7zOenvHUPqQI6n3a1J90pBReNXJXPz9xgiyuhdqqquJtMCc4AZitlTqHNszstrDDSHJYZxCo+b0nv1dPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IXYnNuxD4kA4l6WAifMbGFf/lShJajnJCW2RxxeR2hI=; b=mChMF6gvBrKt4fY2ZPUHX4UWMQDrS2x9LVXW3rDIacLRXsDIJVgFI5opAZSqMemlHEe5OcTZpZbqpN//6VJblSdbzHeVx6fvUDNwuqK0xU8rC0BY70Y1ZhauUCti6JTd8HVEdJxG7ea59s9nEAx7UrTL4jlRMjk4A2g8Pd5tmuf8V0PzlL6zhqVws/6wZBSFHYTRhsaW4K1kSPI8zrfIhyzUzK3XDWVGA4NthDBbMca51O01dDVB9Ia7Wu+7QCY+r2JVat0QpAU/NBLCZE8vrMEx3DOdUmBewZleSlGmp5RT/+0qg3TXMnJuag87HbYKDkoPUYjb+6ElUcUt2CAqmg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IXYnNuxD4kA4l6WAifMbGFf/lShJajnJCW2RxxeR2hI=; b=77BAJVXn+QCBx7pxstRSH9lhv8fFFW9v0rOIW5/PahgTp6KWvdQrkjCiIR4kIAtaGDu/lGUTzGrUk51sufIDKngXbgV6EszXlR1DYbE64J2Zmk15yStk1xGA+MTq71rJiyCVB8NjB2PWUl7SN+15HpbZMyuTItyL8Ev9Jw6oy0Y= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by DU0PR08MB7740.eurprd08.prod.outlook.com (2603:10a6:10:3bc::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.30; Thu, 5 Oct 2023 16:01:20 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::e62:8b0f:9b88:39a1]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::e62:8b0f:9b88:39a1%4]) with mapi id 15.20.6838.033; Thu, 5 Oct 2023 16:01:20 +0000 Date: Thu, 5 Oct 2023 17:01:02 +0100 From: Szabolcs Nagy To: Joe Ramsay , Subject: Re: [PATCH v2] aarch64: Improve vecmath sin routines Message-ID: References: <20231005093138.7209-1-Joe.Ramsay@arm.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231005093138.7209-1-Joe.Ramsay@arm.com> X-ClientProxiedBy: DM6PR04CA0025.namprd04.prod.outlook.com (2603:10b6:5:334::30) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|DU0PR08MB7740:EE_|AM7EUR03FT016:EE_|VI1PR08MB10241:EE_ X-MS-Office365-Filtering-Correlation-Id: 4f8a7655-b313-4e54-6a58-08dbc5bc589f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: lcgRQZFxpi/8I4eq559OoG/RBOMnG45x3TzZYxyGljMQ8CV7Ir1ZzodOW/q8bpSrqZaOZDmK6lSmcxTkce5r5yo837ldrl9K0WXgUgeiI8YKXYNwxyEYQpyxBDBh67B5TW/a2UJ2b8P4OP6lrKHPW3/HmdjXee2KVvlATR22P/VNmu+5XHo4KWPqsgpR4NjnzBg4X5ExD0m96vJGI44vI1vIwZNMSYaiI6yNGYI5pfI3cWS92Y3MzIIy98iMhn1Z85FAvZ24af425+0fhoIiyUuqfHBHo7g0LI1I04c74Jps5oFBzAKw/MlD9l7dezCkqxvO+3f/vQdfRta1ojKsGcXafhkApKUVDm6EVAc5BKiv1npG0eSnscWlc4rU6zPjvauOAPqjZUHKQdz23HVu5V+o/Fz1QsKtw27rUAfPNuYRFJ9PGfcuWCEJJytfMs28yzfpKQbw0JM/herT9LXjLENStuikpLKTlmKCN863/B5u47NGcd7J9JMpc/4GH54mgsgB1nj/wW+h6eGPe98euuH/M+428UaHEDYdo9JGd7ckXbo1j6fbmwTvX5Av9vPbRmxbzxpYy0XnAHi9jZ771CigcbApBUFI9UN83N61E2QGUpuUPDjfrfztW6ILiHlAV82/ehI353dg12+CvqZTvA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(346002)(39860400002)(376002)(396003)(136003)(366004)(230922051799003)(64100799003)(186009)(1800799009)(451199024)(66476007)(2906002)(6512007)(66556008)(6506007)(38100700002)(86362001)(478600001)(6486002)(2616005)(83380400001)(26005)(66946007)(316002)(30864003)(44832011)(36756003)(5660300002)(110136005)(41300700001)(8936002)(8676002)(6666004)(67856001)(473944003);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7740 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 84abce9e-2e30-411c-4ed5-08dbc5bc50af X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2uaQU/pAX+pNtgQEXu2A3eoyhPesx21Gr9zi0/Z/2WRfr4cxXn843tUipq3Yo81fLF9NLfv7jOA1HPfXmDLhzk3fOvWLFXuUPnfi7NJQiuEEu2JrqGHfg11aEe2V7dCujBSaV4KwZ8TwckvKBjOp/2IeJoM/dHIfz1pOV+JjtNbU+Kz9x6jqfZ7iXDEAftiabNzJG9US76OLfB0LTXN83VdgABM+pIYxvyl6I55PX9f0FjBEhmBQhbc82J4ryG/6OVL/ajiAI/iUBTMMpXKcME9OQ3ZNlM7C1gHiUhVvr7AcPokdqAIetf7rmLlS4JF+yDUcJDXeY/VmxbCwkxQ2Eum0QDG9wXHNeC6Z5zLrNzO02o38lVnE5FHCJ+YYBJaMebXC5iMYm64nfTmLzPidL4ADO6oJODldln0XOf7XRJmG+nUtP3KD7IZsUjodQc+qsbpoIN1gqsXACHjH2Wv+oTqeq2hqAAExWi7yVbw/eA4nvkEtWkhEeqqLZDHtDrQDcGtm/GT1riItvmXKgkaBSxgmyDKpfSGzSu1/TgYLKxmFYWyeWkFrOmuJnsKPHRc6Q5o6+jWpxlbv3CfV2CaHk815Hd+pAQ4OQaAtFkH8qdgnl2kaoRIIe73A6EL+FBjIXVoaS8K3PRu5ALzQ5SPN5qv1Qb4cLJN1ccHmqpI7d3hXUTrvj2pV6u4AEZb9WSU5i82MzBS2OLZDi2x1sJu6Bs+V5cDx3ENkGvEFGvNCpAmKazAszHpLUKGzcsrDtZ0tQbxcnMneLh8M+alU1r2fUvFJMg/WMe9At7nv8REcgY4= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(376002)(396003)(39850400004)(230922051799003)(64100799003)(1800799009)(82310400011)(451199024)(186009)(46966006)(40470700004)(36840700001)(478600001)(6506007)(6666004)(6486002)(6512007)(47076005)(83380400001)(336012)(2616005)(26005)(2906002)(30864003)(316002)(110136005)(70206006)(70586007)(8936002)(8676002)(41300700001)(5660300002)(44832011)(36756003)(82740400003)(36860700001)(86362001)(356005)(81166007)(40480700001)(40460700003)(67856001)(473944003);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Oct 2023 16:01:33.3641 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4f8a7655-b313-4e54-6a58-08dbc5bc589f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB10241 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 10/05/2023 10:31, Joe Ramsay wrote: > * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] > * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions > * Improve register use near special-case branch > > Also use overloaded intrinsics for SVE. looks good. committed. > --- > Changes from v1: > * Report new observed global max and max in [-pi/2, pi/2] > Thanks, > Joe > sysdeps/aarch64/fpu/sin_advsimd.c | 10 ++- > sysdeps/aarch64/fpu/sin_sve.c | 106 ++++++++++++++++-------------- > sysdeps/aarch64/fpu/sinf_sve.c | 44 +++++++------ > 3 files changed, 87 insertions(+), 73 deletions(-) > > diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c > index 0389b334cc..3d87a1da79 100644 > --- a/sysdeps/aarch64/fpu/sin_advsimd.c > +++ b/sysdeps/aarch64/fpu/sin_advsimd.c > @@ -24,7 +24,6 @@ static const struct data > float64x2_t poly[7]; > float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; > } data = { > - /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ > .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), > V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), > V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), > @@ -52,6 +51,15 @@ special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp) > return v_call_f64 (sin, x, y, cmp); > } > > +/* Vector (AdvSIMD) sin approximation. > + Maximum observed error in [-pi/2, pi/2], where argument is not reduced, > + is 2.87 ULP: > + _ZGVnN2v_sin (0x1.921d5c6a07142p+0) got 0x1.fffffffa7dc02p-1 > + want 0x1.fffffffa7dc05p-1 > + Maximum observed error in the entire non-special domain ([-2^23, 2^23]) > + is 3.22 ULP: > + _ZGVnN2v_sin (0x1.5702447b6f17bp+22) got 0x1.ffdcd125c84fbp-3 > + want 0x1.ffdcd125c84f8p-3. */ > float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x) > { > const struct data *d = ptr_barrier (&data); > diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c > index c3f450d0ea..54c8dae286 100644 > --- a/sysdeps/aarch64/fpu/sin_sve.c > +++ b/sysdeps/aarch64/fpu/sin_sve.c > @@ -21,20 +21,22 @@ > > static const struct data > { > - double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, > - shift; > + double inv_pi, pi_1, pi_2, pi_3, shift, range_val; > + double poly[7]; > } data = { > - /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ > + .poly = { -0x1.555555555547bp-3, 0x1.1111111108a4dp-7, -0x1.a01a019936f27p-13, > + 0x1.71de37a97d93ep-19, -0x1.ae633919987c6p-26, > + 0x1.60e277ae07cecp-33, -0x1.9e9540300a1p-41, }, > + > .inv_pi = 0x1.45f306dc9c883p-2, > - .half_pi = 0x1.921fb54442d18p+0, > - .inv_pi_over_2 = 0x1.45f306dc9c882p-1, > - .pi_over_2_1 = 0x1.921fb50000000p+0, > - .pi_over_2_2 = 0x1.110b460000000p-26, > - .pi_over_2_3 = 0x1.1a62633145c07p-54, > - .shift = 0x1.8p52 > + .pi_1 = 0x1.921fb54442d18p+1, > + .pi_2 = 0x1.1a62633145c06p-53, > + .pi_3 = 0x1.c1cd129024e09p-106, > + .shift = 0x1.8p52, > + .range_val = 0x1p23, > }; > > -#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ > +#define C(i) sv_f64 (d->poly[i]) > > static svfloat64_t NOINLINE > special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) > @@ -42,56 +44,58 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) > return sv_call_f64 (sin, x, y, cmp); > } > > -/* A fast SVE implementation of sin based on trigonometric > - instructions (FTMAD, FTSSEL, FTSMUL). > - Maximum observed error in 2.52 ULP: > - SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 > - want 0x1.10ace8f3e7868p-40. */ > +/* A fast SVE implementation of sin. > + Maximum observed error in [-pi/2, pi/2], where argument is not reduced, > + is 2.87 ULP: > + _ZGVsMxv_sin (0x1.921d5c6a07142p+0) got 0x1.fffffffa7dc02p-1 > + want 0x1.fffffffa7dc05p-1 > + Maximum observed error in the entire non-special domain ([-2^23, 2^23]) > + is 3.22 ULP: > + _ZGVsMxv_sin (0x1.5702447b6f17bp+22) got 0x1.ffdcd125c84fbp-3 > + want 0x1.ffdcd125c84f8p-3. */ > svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) > { > const struct data *d = ptr_barrier (&data); > > - svfloat64_t r = svabs_f64_x (pg, x); > - svuint64_t sign > - = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); > - svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); > + /* Load some values in quad-word chunks to minimise memory access. */ > + const svbool_t ptrue = svptrue_b64 (); > + svfloat64_t shift = sv_f64 (d->shift); > + svfloat64_t inv_pi_and_pi1 = svld1rq (ptrue, &d->inv_pi); > + svfloat64_t pi2_and_pi3 = svld1rq (ptrue, &d->pi_2); > > - /* Load first two pio2-related constants to one vector. */ > - svfloat64_t invpio2_and_pio2_1 > - = svld1rq_f64 (svptrue_b64 (), &d->inv_pi_over_2); > + /* n = rint(|x|/pi). */ > + svfloat64_t n = svmla_lane (shift, x, inv_pi_and_pi1, 0); > + svuint64_t odd = svlsl_x (pg, svreinterpret_u64 (n), 63); > + n = svsub_x (pg, n, shift); > > - /* n = rint(|x|/(pi/2)). */ > - svfloat64_t q = svmla_lane_f64 (sv_f64 (d->shift), r, invpio2_and_pio2_1, 0); > - svfloat64_t n = svsub_n_f64_x (pg, q, d->shift); > + /* r = |x| - n*(pi/2) (range reduction into -pi/2 .. pi/2). */ > + svfloat64_t r = x; > + r = svmls_lane (r, n, inv_pi_and_pi1, 1); > + r = svmls_lane (r, n, pi2_and_pi3, 0); > + r = svmls_lane (r, n, pi2_and_pi3, 1); > > - /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ > - r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); > - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_2); > - r = svmls_n_f64_x (pg, r, n, d->pi_over_2_3); > + /* sin(r) poly approx. */ > + svfloat64_t r2 = svmul_x (pg, r, r); > + svfloat64_t r3 = svmul_x (pg, r2, r); > + svfloat64_t r4 = svmul_x (pg, r2, r2); > > - /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ > - svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); > + svfloat64_t t1 = svmla_x (pg, C (4), C (5), r2); > + svfloat64_t t2 = svmla_x (pg, C (2), C (3), r2); > + svfloat64_t t3 = svmla_x (pg, C (0), C (1), r2); > > - /* sin(r) poly approx. */ > - svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); > - svfloat64_t y = sv_f64 (0.0); > - y = svtmad_f64 (y, r2, 7); > - y = svtmad_f64 (y, r2, 6); > - y = svtmad_f64 (y, r2, 5); > - y = svtmad_f64 (y, r2, 4); > - y = svtmad_f64 (y, r2, 3); > - y = svtmad_f64 (y, r2, 2); > - y = svtmad_f64 (y, r2, 1); > - y = svtmad_f64 (y, r2, 0); > - > - /* Apply factor. */ > - y = svmul_f64_x (pg, f, y); > - > - /* sign = y^sign. */ > - y = svreinterpret_f64_u64 ( > - sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); > + svfloat64_t y = svmla_x (pg, t1, C (6), r4); > + y = svmla_x (pg, t2, y, r4); > + y = svmla_x (pg, t3, y, r4); > + y = svmla_x (pg, r, y, r3); > > + svbool_t cmp = svacle (pg, x, d->range_val); > + cmp = svnot_z (pg, cmp); > if (__glibc_unlikely (svptest_any (pg, cmp))) > - return special_case (x, y, cmp); > - return y; > + return special_case (x, > + svreinterpret_f64 (sveor_z ( > + svnot_z (pg, cmp), svreinterpret_u64 (y), odd)), > + cmp); > + > + /* Copy sign. */ > + return svreinterpret_f64 (sveor_z (pg, svreinterpret_u64 (y), odd)); > } > diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c > index 4d2ce7a846..590881c14b 100644 > --- a/sysdeps/aarch64/fpu/sinf_sve.c > +++ b/sysdeps/aarch64/fpu/sinf_sve.c > @@ -23,7 +23,7 @@ static const struct data > { > float poly[4]; > /* Pi-related values to be loaded as one quad-word and used with > - svmla_lane_f32. */ > + svmla_lane. */ > float negpi1, negpi2, negpi3, invpi; > float shift; > } data = { > @@ -57,40 +57,42 @@ svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) > { > const struct data *d = ptr_barrier (&data); > > - svfloat32_t ax = svabs_f32_x (pg, x); > - svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), > - svreinterpret_u32_f32 (ax)); > - svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); > + svfloat32_t ax = svabs_x (pg, x); > + svuint32_t sign > + = sveor_x (pg, svreinterpret_u32 (x), svreinterpret_u32 (ax)); > + svbool_t cmp = svcmpge (pg, svreinterpret_u32 (ax), RangeVal); > > /* pi_vals are a quad-word of helper values - the first 3 elements contain > -pi in extended precision, the last contains 1 / pi. */ > - svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &d->negpi1); > + svfloat32_t pi_vals = svld1rq (svptrue_b32 (), &d->negpi1); > > /* n = rint(|x|/pi). */ > - svfloat32_t n = svmla_lane_f32 (sv_f32 (d->shift), ax, pi_vals, 3); > - svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); > - n = svsub_n_f32_x (pg, n, d->shift); > + svfloat32_t n = svmla_lane (sv_f32 (d->shift), ax, pi_vals, 3); > + svuint32_t odd = svlsl_x (pg, svreinterpret_u32 (n), 31); > + n = svsub_x (pg, n, d->shift); > > /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ > svfloat32_t r; > - r = svmla_lane_f32 (ax, n, pi_vals, 0); > - r = svmla_lane_f32 (r, n, pi_vals, 1); > - r = svmla_lane_f32 (r, n, pi_vals, 2); > + r = svmla_lane (ax, n, pi_vals, 0); > + r = svmla_lane (r, n, pi_vals, 1); > + r = svmla_lane (r, n, pi_vals, 2); > > /* sin(r) approx using a degree 9 polynomial from the Taylor series > expansion. Note that only the odd terms of this are non-zero. */ > - svfloat32_t r2 = svmul_f32_x (pg, r, r); > + svfloat32_t r2 = svmul_x (pg, r, r); > svfloat32_t y; > - y = svmla_f32_x (pg, C (2), r2, C (3)); > - y = svmla_f32_x (pg, C (1), r2, y); > - y = svmla_f32_x (pg, C (0), r2, y); > - y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); > + y = svmla_x (pg, C (2), r2, C (3)); > + y = svmla_x (pg, C (1), r2, y); > + y = svmla_x (pg, C (0), r2, y); > + y = svmla_x (pg, r, r, svmul_x (pg, y, r2)); > > /* sign = y^sign^odd. */ > - y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), > - sveor_u32_x (pg, sign, odd))); > + sign = sveor_x (pg, sign, odd); > > if (__glibc_unlikely (svptest_any (pg, cmp))) > - return special_case (x, y, cmp); > - return y; > + return special_case (x, > + svreinterpret_f32 (sveor_x ( > + svnot_z (pg, cmp), svreinterpret_u32 (y), sign)), > + cmp); > + return svreinterpret_f32 (sveor_x (pg, svreinterpret_u32 (y), sign)); > } > -- > 2.27.0 >