From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2084.outbound.protection.outlook.com [40.107.249.84]) by sourceware.org (Postfix) with ESMTPS id 3E6A938582AB for ; Fri, 3 Feb 2023 13:05:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E6A938582AB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=i4wEwYgy7bZiqHNAeMxPMJnHORUYhB43ajN4Bs2BshQ=; b=T2zpg89H6BMTxfRgidHXplLDF8rXcBDCJTn12Wiik/kto4a4SeJxbzs06+HGvHcAq1MQ+eGGy/3smCmZDb87T1GRD0vZkqFKonDfR0/POAPFZxi66NTYVajBkNzeGLeO+rFN9bA8couIbF1L5p2+8datiaWz+TOTU3tJHeXFhuw= Received: from AM6PR0502CA0072.eurprd05.prod.outlook.com (2603:10a6:20b:56::49) by DU0PR08MB9511.eurprd08.prod.outlook.com (2603:10a6:10:44d::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.23; Fri, 3 Feb 2023 13:05:54 +0000 Received: from AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:56:cafe::b4) by AM6PR0502CA0072.outlook.office365.com (2603:10a6:20b:56::49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.28 via Frontend Transport; Fri, 3 Feb 2023 13:05:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT040.mail.protection.outlook.com (100.127.140.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.28 via Frontend Transport; Fri, 3 Feb 2023 13:05:54 +0000 Received: ("Tessian outbound b1d3ffe56e73:v132"); Fri, 03 Feb 2023 13:05:54 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4625dbdb2029c0c8 X-CR-MTA-TID: 64aa7808 Received: from 556b785cd7d7.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 911C7221-095D-4669-84DF-D23B834B69B5.1; Fri, 03 Feb 2023 13:05:43 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 556b785cd7d7.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 03 Feb 2023 13:05:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kKeBJLioln4g/6f2DOpCaMrcxZOnQfKgVmTFagLvcDw1/1QwhzkXLs0Z5Yqrq7OtJDTDapED3irIyVn3waGCKw82wL1Aqjr3VLZ/44IdM6yPwO4Nu3i8mLU7UFaxEZ9Gw7ut3ge+NpBANp3do/iSNZTFuw8uZeUKtITz/flk0/OSnYzx5YfzxIVnp8jZ2RcsHagls1OTQ6fIPX3bPwLNktafCMp+Zv1olXDGiDicQrs8GFnqGpqvNeDJz+JXmcYoFEi5i9o0TPACdyHD41Ly9+z6W01zMM6V+/gq2f32mJRSpU+xMJXSIWJsMiKFdC/tK6r3FLvlpOTetX4CCEyQiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=i4wEwYgy7bZiqHNAeMxPMJnHORUYhB43ajN4Bs2BshQ=; b=jXO2KYUt2w/r5F7ReRvMMV1juDjEdGNir/i7bGzGuOtcYJTbnZYAWFjompbeNHleFJr3oxTkX4QpP/UwJ07ELEBUsBY0VFN+H7SG6d1dDVU3eV/soSrgUIvWUUSvexKMPG5rROhcZiE58w5fMm9nryfomD5+uSPlMlyNdl/Ciais4+pdjO8FPZ7cylpPy267AAEMZmtuZDhKTsplANxw+STADt9loNMvmpEN9MV8dVw/8ogeudZOOzx/7gbI0KiFfAizIEPd/75bMF8eCVFHrmgk1B0Fz2szBJ44dE1hLqkOT2uORDApax5VG+vWXwST51JsTax7MLl++1PTX8ddIw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=i4wEwYgy7bZiqHNAeMxPMJnHORUYhB43ajN4Bs2BshQ=; b=T2zpg89H6BMTxfRgidHXplLDF8rXcBDCJTn12Wiik/kto4a4SeJxbzs06+HGvHcAq1MQ+eGGy/3smCmZDb87T1GRD0vZkqFKonDfR0/POAPFZxi66NTYVajBkNzeGLeO+rFN9bA8couIbF1L5p2+8datiaWz+TOTU3tJHeXFhuw= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by GV1PR08MB7899.eurprd08.prod.outlook.com (2603:10a6:150:5e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.28; Fri, 3 Feb 2023 13:05:40 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::dc17:8fa2:cce5:3573%8]) with mapi id 15.20.6064.024; Fri, 3 Feb 2023 13:05:39 +0000 From: Wilco Dijkstra To: 'GNU C Library' CC: Szabolcs Nagy Subject: [PATCH] AArch64: Improve SVE memcpy and memmove Thread-Topic: [PATCH] AArch64: Improve SVE memcpy and memmove Thread-Index: AQHZN9AY4tw+5ZHKQE6EhdG6ShhZ0Q== Date: Fri, 3 Feb 2023 13:05:39 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|GV1PR08MB7899:EE_|AM7EUR03FT040:EE_|DU0PR08MB9511:EE_ X-MS-Office365-Filtering-Correlation-Id: f31c1489-39d9-40ae-bf83-08db05e76234 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 15tLyJ7M5G8j8wPrUsBb8Bv6ROtvzFbym1uz72piZo5I7MUvU48oPVvHX6kWnsrA3d+zdH3Ci2DmRbzz9LpGa+tN4ywu/GhR65C9ZhCtTZoEniAtPc2fKmMyY//7B/WxTv58z7Jo2n04ldfoItKdKQctn4A4lZkCwGcmJaNYhVqD8oLXY1ttk7Adpik9CpN0+MGhxWlMREafNNyhkk74zme+DHAEnF/XUKlKb+zlX71asLaoW6kV2RmlfETG8eF/fRWWOUnxap4oZomnoRYgUhMEZeaNZkCjnNUkAkjFAFWwF3r7FCcLBID99XG/jYDl/bJHlR55krs0CuGaaLm5DzG4HjAnOoI7SzGLkUKG3eeOsYB8uxyER0+U4zbHo45/8I9Z0F1RkI8QRDDqo/bp+6iS9r28U1BmkVBMNAWCOIJkLtnoo6j7rC22adl+RnDDt3r1aFt4uvTGn7+7ChA0ps3iqIt0u+WFfCn6TteCXHJFJiR2+qd1GKVbvi5vGDuROAdGD9i/ec9UPC2kPCnQ5tGUylmVc7EELnM6pQaVwiuGf7xHhe/RhbEUDU2/sk2Krb/jXZ6OsvJoHVFhBMYY90fRFUStD9xk71Z5M6iDiu5NebrhGqNVFkhMC5ZRM8P75GwFC+eJszP/xn+FK3Sk2I8bTCx/9n5E5PeMbgMXQ/NR8lQzfij/9AHOTYYPrfIP85TpFLJviPQXT/+LzCwWY2qYekBuFkOTj8RRbQiegiM= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PAWPR08MB8982.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(136003)(396003)(39860400002)(366004)(346002)(376002)(451199018)(41300700001)(38070700005)(33656002)(6916009)(4326008)(8676002)(66446008)(86362001)(316002)(91956017)(66946007)(76116006)(55016003)(478600001)(66556008)(66476007)(7696005)(64756008)(5660300002)(26005)(186003)(122000001)(9686003)(2906002)(71200400001)(52536014)(6506007)(38100700002)(8936002)(357404004);DIR:OUT;SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7899 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a15e87f1-7cad-44c2-f3ae-08db05e75907 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QuTj7lG5hH3tTprQtTRQHJ94MlRfyYetLkaNuz0YT7k0bpLoEMOH/j5Q8atiKnP4Gb8tUMl2oO9OgAwRvz7ksgaUbdbMssYhFtSmorYraGIfUFjpsl3vy93aLopJpKFpV0kJvkwonuKLMU9p97OEiHETc8o68O0/RhBdPG3PPbGQi0Nu/pzPcJMqPzWCIWwVVf6bkmYKci/rrPLixLTuc1Dv9JTMplG8EoSoFvlpHlMtcFrFtNhMGamuyjIsGv6T/GPCPjb8lHn+jALw8bgHVI+Oa80zXr8XXh1E9iJl3MkPc7qGCr9Xi2xmVBL1D4riFhYtQHcppFfgFc7vZ50V1lrKjj0yvFHuTViBrGuq6MluOK1do8HTPvAnRcb4wdadDjVcSzwPJp9ytUIhQxuN6MEs0ER7v03uKFCPtRFTtPl0XXlgauojRe7s0R3yNcSrmnrVmHuC6OpcMN1qkkWRPHqF5YfQkG4wwXHyYV+JD+RZsPi+ZOIBH+I4SUAotLwjnWiRwNiOUMv7yWmfS26ZIirE0/MIvwm3+fBhROCZxRWFUi9G+J50fRsV9DR3KtSyBeewQn8HJMKr6F5vfhE/WQRjtQqIAZIGfpWL6URwZk2FpbjJzGokoUUaQFM1C07jmCqJ+xzgYoM6m3pXCwcOphKi4EZZAC5fxwGOHA+6lg6OcG/yL/8JzBb5nPm8JYZWaqrBvuA142n9nLsdO/D9//cZWEy2FVYl2D3VA6pxgCQ= X-Forefront-Antispam-Report: CIP:63.35.35.123;CTRY:IE;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:64aa7808-outbound-1.mta.getcheckrecipient.com;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;CAT:NONE;SFS:(13230025)(4636009)(376002)(136003)(39860400002)(346002)(396003)(451199018)(46966006)(40470700004)(36840700001)(2906002)(26005)(82740400003)(6506007)(70586007)(41300700001)(81166007)(52536014)(86362001)(316002)(40480700001)(55016003)(5660300002)(4326008)(8936002)(33656002)(356005)(70206006)(478600001)(8676002)(7696005)(40460700003)(9686003)(186003)(47076005)(6916009)(36860700001)(82310400005)(336012)(357404004);DIR:OUT;SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2023 13:05:54.5796 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f31c1489-39d9-40ae-bf83-08db05e76234 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT040.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9511 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_NONE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE,TXREP,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Improve SVE memcpy/memmove by copying 2 vectors if the size is small enough= .=0A= This improves performance of random memcpy by ~9% on Neoverse V1, and=0A= memcpy/memmove of 33-64 bytes become ~16% faster.=0A= =0A= Passes regress, OK for commit?=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/multiarch/memcpy_sve.S b/sysdeps/aarch64/multi= arch/memcpy_sve.S=0A= index f4dc214f60bf25e818eb6b8de2d4093ad0c886e1..d11be6a44301af4bfd7fa490055= 5b769dc58d34d 100644=0A= --- a/sysdeps/aarch64/multiarch/memcpy_sve.S=0A= +++ b/sysdeps/aarch64/multiarch/memcpy_sve.S=0A= @@ -67,14 +67,15 @@ ENTRY (__memcpy_sve)=0A= =0A= cmp count, 128=0A= b.hi L(copy_long)=0A= - cmp count, 32=0A= + cntb vlen=0A= + cmp count, vlen, lsl 1=0A= b.hi L(copy32_128)=0A= -=0A= whilelo p0.b, xzr, count=0A= - cntb vlen=0A= - tbnz vlen, 4, L(vlen128)=0A= - ld1b z0.b, p0/z, [src]=0A= - st1b z0.b, p0, [dstin]=0A= + whilelo p1.b, vlen, count=0A= + ld1b z0.b, p0/z, [src, 0, mul vl]=0A= + ld1b z1.b, p1/z, [src, 1, mul vl]=0A= + st1b z0.b, p0, [dstin, 0, mul vl]=0A= + st1b z1.b, p1, [dstin, 1, mul vl]=0A= ret=0A= =0A= /* Medium copies: 33..128 bytes. */=0A= @@ -102,14 +103,6 @@ L(copy96):=0A= stp C_q, D_q, [dstend, -32]=0A= ret=0A= =0A= -L(vlen128):=0A= - whilelo p1.b, vlen, count=0A= - ld1b z0.b, p0/z, [src, 0, mul vl]=0A= - ld1b z1.b, p1/z, [src, 1, mul vl]=0A= - st1b z0.b, p0, [dstin, 0, mul vl]=0A= - st1b z1.b, p1, [dstin, 1, mul vl]=0A= - ret=0A= -=0A= .p2align 4=0A= /* Copy more than 128 bytes. */=0A= L(copy_long):=0A= @@ -158,14 +151,15 @@ ENTRY (__memmove_sve)=0A= =0A= cmp count, 128=0A= b.hi L(move_long)=0A= - cmp count, 32=0A= + cntb vlen=0A= + cmp count, vlen, lsl 1=0A= b.hi L(copy32_128)=0A= -=0A= whilelo p0.b, xzr, count=0A= - cntb vlen=0A= - tbnz vlen, 4, L(vlen128)=0A= - ld1b z0.b, p0/z, [src]=0A= - st1b z0.b, p0, [dstin]=0A= + whilelo p1.b, vlen, count=0A= + ld1b z0.b, p0/z, [src, 0, mul vl]=0A= + ld1b z1.b, p1/z, [src, 1, mul vl]=0A= + st1b z0.b, p0, [dstin, 0, mul vl]=0A= + st1b z1.b, p1, [dstin, 1, mul vl]=0A= ret=0A= =0A= .p2align 4=0A= =0A=