From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2070.outbound.protection.outlook.com [40.107.8.70]) by sourceware.org (Postfix) with ESMTPS id 626833858D1E for ; Mon, 6 Feb 2023 14:29:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 626833858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c/zSMxMTo77zwEDb7mXrBf5hZjxAVJbnJPBlAYsuYvQ=; b=adRiSUzPOKkjAerJ2f8F1dV8ZYi+ytD3Ix6rCvPZVKBaYKxLtq5g2veHXVTMjWxn+fhl91a7UMRIlqUzUAEyUPtcwTCF2TWWIpnQjU+eXT1yqU1ZUdeKlnqH4jsytvxHnmFgMR+ejYG5uN6xcUoPJPraiWRKzaOvx5+XEBjAHXg= Received: from DUZPR01CA0098.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bb::29) by VE1PR08MB5647.eurprd08.prod.outlook.com (2603:10a6:800:1b2::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.34; Mon, 6 Feb 2023 14:29:46 +0000 Received: from DBAEUR03FT014.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4bb:cafe::62) by DUZPR01CA0098.outlook.office365.com (2603:10a6:10:4bb::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.34 via Frontend Transport; Mon, 6 Feb 2023 14:29:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT014.mail.protection.outlook.com (100.127.143.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.35 via Frontend Transport; Mon, 6 Feb 2023 14:29:46 +0000 Received: ("Tessian outbound 3ad958cd7492:v132"); Mon, 06 Feb 2023 14:29:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: e2c5714cd6bfd870 X-CR-MTA-TID: 64aa7808 Received: from ac87c68e0288.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 01B75EF4-4491-4244-B1C6-8DDAE91795ED.1; Mon, 06 Feb 2023 14:29:39 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ac87c68e0288.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Feb 2023 14:29:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VcSeyoCCICcl4Ak9UX9QfzKrLx2m8hNzkjfkehkubRKoQlBDD/cxxXamwkaWPbpxMUNmx0UNsdiUD+dPVAYkcoYCNKI3VZf7RXKk/7o/WCj/ukdSxbFYNgBvlzIbDMCmnZxow4zvJYb7mfgLSubO28cO1XXT+fj2rKQr3ewsR/fwgsbtzsuP7N3hSaAMR5HS6YxVzGOuwPPCDhhVswcdGjLgZ+MjNT4hkJjmCM3m0rvVE9KsWm3SNM6GrIcQXXqkNiggALcA1Xus4i/CEnWZBm99zNUFyRrJrkxmhtIZvyCFnk6lCjFNq7cZpDspDZjUY0Le+Ffyu1UA55rYZdgJJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=c/zSMxMTo77zwEDb7mXrBf5hZjxAVJbnJPBlAYsuYvQ=; b=I0Jjtle0/9XxwrbYP05xJ+XbkjPTUb6Uk60cbtC0Wt/PTwMqnNqiir2nkaElhP8U0DIE4sds8INlomgid0sXEaIAcZnJccYnExqh9bk0bZsA4pXwwrUDJXkNE4MyfmsdhWDcva5wZjAnbuiRExP+G+Km0BTfVlDzooGDZVE0/VkDb73F0hn0PUvR+dU6Z46GPp7rwjFnWsn4UsJnpIyeSgWK1HL492WL92lCImA3b3UzY8cjKa08DL+Ycmj7Jo5RHC2Wez7xnyQWEkiK/msmkidIV9rCTxSk4U3kAOPa++XfK6NDnM104ZdCOHC5ito+ldtO4VvlqcRmOmg6G/9tBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c/zSMxMTo77zwEDb7mXrBf5hZjxAVJbnJPBlAYsuYvQ=; b=adRiSUzPOKkjAerJ2f8F1dV8ZYi+ytD3Ix6rCvPZVKBaYKxLtq5g2veHXVTMjWxn+fhl91a7UMRIlqUzUAEyUPtcwTCF2TWWIpnQjU+eXT1yqU1ZUdeKlnqH4jsytvxHnmFgMR+ejYG5uN6xcUoPJPraiWRKzaOvx5+XEBjAHXg= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by AS2PR08MB9474.eurprd08.prod.outlook.com (2603:10a6:20b:5e9::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.34; Mon, 6 Feb 2023 14:29:37 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::29ca:64bc:9a10:b072]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::29ca:64bc:9a10:b072%7]) with mapi id 15.20.6064.034; Mon, 6 Feb 2023 14:29:37 +0000 Date: Mon, 6 Feb 2023 14:29:24 +0000 From: Szabolcs Nagy To: Wilco Dijkstra Cc: 'GNU C Library' Subject: Re: [PATCH] AArch64: Improve SVE memcpy and memmove Message-ID: References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LNXP265CA0078.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:76::18) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: DB9PR08MB7179:EE_|AS2PR08MB9474:EE_|DBAEUR03FT014:EE_|VE1PR08MB5647:EE_ X-MS-Office365-Filtering-Correlation-Id: dabf6307-e0f8-427d-513e-08db084e98c6 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9DS3elHJiV0T8iRXGVFGpsUMeeb7e2o1gyO52o5MnOJJQysnWFyllccwju8LBymmz/VbusRQl2vYF3A/ewJ0mWcRFXlAEdioQ/ZRrNrx1CIri/RgXy76bBH+s7k72gm+dnqgPmYdxzfZBqOAC1GqHzRbt+iX9wexYii6yvQ3REK4KmrjbOqOsQb/I+oKzlOHGJt8rrcoJSDBYCfi7GARzzEXXgf+wmg7wb7gwbqfDhy62WJuLLfl+ARwx5ic6Oh/GcjGgs9k19Al2cd9HBv8cTDmFIH8PzB41VxwW2ei/SFgqR/TsDGMarcMHJPbgy5LYkEMIpPytqP3mnfCscPNLfXJeyXa6M1y2+OvaWYAbym898GO/dMurd3m8FX9Hbpi3NyBK+KDUOExX35D4BQnnV0ogY0WeCmMy2wn2BKSF2mjeYSDkbF/Jax6MSewD0O8IcBBYJ4qeQYxr/gviQH5aOMzIXFVxLWz2rHX6MUmC5aB7I3D4kLld0naHBk0VqBbSTsjfFELiSTV5Uz81NK7djybUXw3WdnSYIuDVr4lEXSGQie2wVvk7EeeYdwBbfLEP/BUfqb3DhM1mDq1xZrbLHi/nFRk6tDc8O1GVsnnRbj3kf/FMaNYaEBVhn9bytB0qlt5yDzYZFlfXYFHYQFtwCwV+tK0gJyy0xtfHNN6y5zWAkbQsb9mFqv28NUOclQASILIVtIbkwuyX0J+7MXLVZj/+ipNBf4XkvfT3bZOjrI= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB9PR08MB7179.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(136003)(346002)(376002)(366004)(396003)(39860400002)(451199018)(66946007)(6636002)(316002)(8676002)(41300700001)(6862004)(66556008)(66476007)(8936002)(4326008)(5660300002)(36756003)(86362001)(38100700002)(6506007)(186003)(6512007)(6666004)(26005)(2906002)(37006003)(478600001)(6486002)(2616005)(67856001)(357404004);DIR:OUT;SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9474 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT014.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 0d16c775-633f-4761-038a-08db084e932b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XgiJZ3ZBDAlT8dURx81Q5PiXHwuYi11U5fK8QhdQ61T4JfI0Of4B/wd8weR15Ctlsf555SOwvD/83lplMTaa6SfON+F7K63ZttQVvVVlcgq+MVD5vmWSiW+yQfhN8Bced/GoUPfm1L+GyiqDFm6wfEelkM8ikufB/8vTh37IfxkkkRJWtpD446I2I9bop7Q78LX5GTz/AWtbLejCXC87rx0c7OVQXMr2YI9AHyAWLLJHgr7Mg/BkIDwg6i34l5P3NOqs6MBpnrTZLgvjfEE9Yu0H7oze6OThso5z7dhZJGFc5AKfTRa4w5RsLmetVfhmAXpdHWh/3lydnIlQuF2d/RbrrYlOTSHt8sp6FTrYjLEWIJJr8E1pnf+P5ZbnOa4//R9IU45g6DuIeTlGNwyKzIQNPnsXHevrDp9V6gk9mjuRvvbQ33ieboUTEA+IfqjB/hNZsjx9jOKptHgqHtPSb+v98xMDy8/khKSVd5Ln2YGgwOThxJeCJBiKZ8K+nGAgXFIgYVgJE4on+CZO38EyXS8DErHcOeNWlXvBXQOHcsoM3A344Zh1KF1kslfYrKUrkKbo8uCobxWkBYDWCV5SJtbV/JvX/255GDjDnICyA0p43yc2ZOWRsZ6k6ErgxG087hOwLesDo5nx6oo2rgP0VWUkqY9lHNjsYXFuhJ/5RoiE296EC0ysZsRrCGeZXS8oXE+P5tN0AjSafRh7JmV3ARkBprgRXht+YitdKScwak8= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230025)(4636009)(39860400002)(376002)(346002)(396003)(136003)(451199018)(36840700001)(46966006)(336012)(82310400005)(47076005)(478600001)(6636002)(37006003)(316002)(36756003)(6486002)(40480700001)(82740400003)(356005)(81166007)(36860700001)(6506007)(6512007)(2616005)(26005)(6666004)(186003)(6862004)(5660300002)(86362001)(8936002)(41300700001)(4326008)(70586007)(70206006)(8676002)(2906002)(67856001)(357404004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Feb 2023 14:29:46.6662 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: dabf6307-e0f8-427d-513e-08db084e98c6 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT014.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5647 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The 02/03/2023 13:05, Wilco Dijkstra wrote: > Improve SVE memcpy/memmove by copying 2 vectors if the size is small enough. > This improves performance of random memcpy by ~9% on Neoverse V1, and > memcpy/memmove of 33-64 bytes become ~16% faster. > > Passes regress, OK for commit? This is ok to commit, thanks. Reviewed-by: Szabolcs Nagy > > --- > > diff --git a/sysdeps/aarch64/multiarch/memcpy_sve.S b/sysdeps/aarch64/multiarch/memcpy_sve.S > index f4dc214f60bf25e818eb6b8de2d4093ad0c886e1..d11be6a44301af4bfd7fa4900555b769dc58d34d 100644 > --- a/sysdeps/aarch64/multiarch/memcpy_sve.S > +++ b/sysdeps/aarch64/multiarch/memcpy_sve.S > @@ -67,14 +67,15 @@ ENTRY (__memcpy_sve) > > cmp count, 128 > b.hi L(copy_long) > - cmp count, 32 > + cntb vlen > + cmp count, vlen, lsl 1 > b.hi L(copy32_128) > - > whilelo p0.b, xzr, count > - cntb vlen > - tbnz vlen, 4, L(vlen128) > - ld1b z0.b, p0/z, [src] > - st1b z0.b, p0, [dstin] > + whilelo p1.b, vlen, count > + ld1b z0.b, p0/z, [src, 0, mul vl] > + ld1b z1.b, p1/z, [src, 1, mul vl] > + st1b z0.b, p0, [dstin, 0, mul vl] > + st1b z1.b, p1, [dstin, 1, mul vl] > ret > > /* Medium copies: 33..128 bytes. */ > @@ -102,14 +103,6 @@ L(copy96): > stp C_q, D_q, [dstend, -32] > ret > > -L(vlen128): > - whilelo p1.b, vlen, count > - ld1b z0.b, p0/z, [src, 0, mul vl] > - ld1b z1.b, p1/z, [src, 1, mul vl] > - st1b z0.b, p0, [dstin, 0, mul vl] > - st1b z1.b, p1, [dstin, 1, mul vl] > - ret > - > .p2align 4 > /* Copy more than 128 bytes. */ > L(copy_long): > @@ -158,14 +151,15 @@ ENTRY (__memmove_sve) > > cmp count, 128 > b.hi L(move_long) > - cmp count, 32 > + cntb vlen > + cmp count, vlen, lsl 1 > b.hi L(copy32_128) > - > whilelo p0.b, xzr, count > - cntb vlen > - tbnz vlen, 4, L(vlen128) > - ld1b z0.b, p0/z, [src] > - st1b z0.b, p0, [dstin] > + whilelo p1.b, vlen, count > + ld1b z0.b, p0/z, [src, 0, mul vl] > + ld1b z1.b, p1/z, [src, 1, mul vl] > + st1b z0.b, p0, [dstin, 0, mul vl] > + st1b z1.b, p1, [dstin, 1, mul vl] > ret > > .p2align 4 >