From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60079.outbound.protection.outlook.com [40.107.6.79]) by sourceware.org (Postfix) with ESMTPS id 00180385800A for ; Wed, 1 Dec 2021 10:04:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 00180385800A Received: from DB6PR0402CA0017.eurprd04.prod.outlook.com (2603:10a6:4:91::27) by VE1PR08MB5677.eurprd08.prod.outlook.com (2603:10a6:800:1ab::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.23; Wed, 1 Dec 2021 10:04:47 +0000 Received: from DB5EUR03FT043.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:91:cafe::dd) by DB6PR0402CA0017.outlook.office365.com (2603:10a6:4:91::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4755.11 via Frontend Transport; Wed, 1 Dec 2021 10:04:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT043.mail.protection.outlook.com (10.152.20.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4755.13 via Frontend Transport; Wed, 1 Dec 2021 10:04:47 +0000 Received: ("Tessian outbound de6049708a0a:v110"); Wed, 01 Dec 2021 10:04:47 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ad49a5821d63aadf X-CR-MTA-TID: 64aa7808 Received: from 8c3ea60a971d.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 56A09D77-8B69-4D6D-82CD-C292CE8FD064.1; Wed, 01 Dec 2021 10:04:36 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 8c3ea60a971d.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 01 Dec 2021 10:04:36 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OqpdK6sNrwFW+oSnLUni5FJR9Fld89xGILEG7Vx8dLeQF0DtB2NNV2e67Mxwh8dzFUI0L0rSb5cTLlXqAXimYBmfyN56uNPG0oqUzPIFD3j13iQ7RPiMS3fb7IcXziRGtN5KLdnyJX4/zpSay2zVRMEpwM7QrGInFVr7Kpl3hc8hJEOGDex4V1Qy5iyOtGpi5b6SI45F8Pe/i3G63nV7VddmLMkIpYMD71CCCDBsIWs9+c0zYmqLnxfNbexyBQTbSSy1kApr4IpgGwOjj2BjFB8wH+JAg0xD+u1pG4FZjWHLCAWra+hbxemmJXI3UFzkcvvP3JwXVYloLu/CYuNzWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jj+16pNt+qXRu2JL1uo4kFiOoIewjqwtZcFDZM7psOw=; b=LofTtshPUNJp4AtkGTpRzCe2XtXT0SdU7AUEBN1Tc7rR9LhbFOb/gwuVf4AtjUZqeuFA6Qy1Hb4W8A0YzLh5nb8lSUtyi6riUuHQG8UzQl1GRq6BMA2om7kyc9CBjH4MLPG3w8KMOv38+/AHDATHU7CWP8/zqsXuGlbgGn93aqVV3RPEWgoDmn5yjsaY/FwVvTcYm9fj4uYHBr54PpF6hhE7XMoHxIjhI11VnJcMSxiCyApdPY0QZJy5ER8LNIu5bWNg7cCuziSgkzgghWHKvZdrVOFvk03/7LTA5kYEk4kAlv5/h1CTcosT7gph9bY8zhYyXqvV9Nns7F4geuw0hg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) by DB9PR08MB7099.eurprd08.prod.outlook.com (2603:10a6:10:2c4::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.23; Wed, 1 Dec 2021 10:04:35 +0000 Received: from DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::25f9:a7e6:422a:da43]) by DB9PR08MB7179.eurprd08.prod.outlook.com ([fe80::25f9:a7e6:422a:da43%5]) with mapi id 15.20.4755.014; Wed, 1 Dec 2021 10:04:34 +0000 Date: Wed, 1 Dec 2021 10:04:27 +0000 From: Szabolcs Nagy To: Wilco Dijkstra Cc: "naohirot@fujitsu.com" , 'GNU C Library' Subject: Re: [PATCH v2] AArch64: Improve A64FX memcpy Message-ID: <20211201100427.GP1982710@arm.com> References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SN4PR0501CA0089.namprd05.prod.outlook.com (2603:10b6:803:22::27) To DB9PR08MB7179.eurprd08.prod.outlook.com (2603:10a6:10:2cc::19) MIME-Version: 1.0 Received: from arm.com (217.140.106.52) by SN4PR0501CA0089.namprd05.prod.outlook.com (2603:10b6:803:22::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4755.10 via Frontend Transport; Wed, 1 Dec 2021 10:04:33 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a6739a03-35a9-49ae-d388-08d9b4b20190 X-MS-TrafficTypeDiagnostic: DB9PR08MB7099:|VE1PR08MB5677: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:1751;OLM:1751; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: jcEu/6mH0w7WLAuweaoR2KM6ttbTPIviQMIu0BNQEMYqJtPeaiMcGZb4V+rnfhJOqSDjXHGd8VKrDSx0dEN0YRy5JJKh1wkr1j/SfefqrrDWuBXvf2V85Ga08S4zIGFkK4jZNF/Yu1er7Mxwj8mCetAVvvArwKlpIYy8hj334/KLOt7vKpyz1LmdvZbnaQMYGksd7Muf3UURnzHJkuyeNlIku4rWYk88SZ9/hNDrXnK7EWsLMh7OTNCxch21PcSsYTbkiVXU9AqMo3UFgiABcfOIPUxR7j2hU8V4Oir0daNCEc7/NybVayQhdh1wKhx3kSco7RZ5sRhrbF32RSTrTBjP0tn99QNtZi2R7PgXvebzEQB6geur2eseABJ7nF7s5UQ7/lSnNSYWMsUZUVL7QROd8Gpj4N+t64yIephQ8uRIxp+1DFl9HJvUzgzEjRbEq1RvHcf1qXQ8QoAG0RjD6lkcslN/EpzjZr4k2GFRRTY+TPEtQOpHD+4TD8BXoGBXEBMZZgAbrsJzsaBbgY6bHMW8qk7DJpKSBo/SLYLxXEmyYWhg6CZ9QpWS9Tjnj115YY0yId6ZTHx97OknGKNLnn8/I7AFi2xdEw+UZrwgTINFdqJMrRwvLGiizbTQehfhumwCAFWjdTHZRUmRrLYvVJ2lY5LPYRqZY1QCrKDrfHXYp17mechbGyug+hvkay75cHd/SnHHGD2fW/t7AJ4e2A== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB9PR08MB7179.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(66556008)(83380400001)(36756003)(6666004)(5660300002)(7696005)(8886007)(66946007)(52116002)(66476007)(38350700002)(38100700002)(37006003)(33656002)(86362001)(956004)(6636002)(4326008)(55016003)(2616005)(44832011)(1076003)(54906003)(26005)(2906002)(316002)(4744005)(8936002)(186003)(508600001)(8676002)(6862004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB7099 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 318436bc-e27a-4267-5b0f-08d9b4b1fa01 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: u5b/2BExTfJq8p+aF6bM/CZjlylbl8Uv4rRcexDWYY+rrZAUzy8XuAaUH2kinZXgaG7zQtAk0bK/vMZoK+ikWSWO6n/SyK/wo5pcWBOJM1mr3jB7Ct38IK6Zz8s71JBLumz+kdwHkF6991kXbpHopjv9Jg4C3Jv8vUZiyMVKbrKxQwIvSkBQEH2364EHT8ZTotktiEjGcsD4RKqR8BxwYgsNoP3iMLsNYY5DEkvSpjsKuufxJFGm/AdqSidEcZmo1rtb49F0ccSCCoqTbOXEj9TsD0PocliqpzycfKUykEWdiiOOSzq6S+rN2oPfssgxBGzQosBrl95Fa3aZSeMya49NVd9nfOJlwtOP8yPs37HlWtneqooa0YZrxo++UkFVnaeBdqCtFoEVmqJWHSlM8VMF/TZkWRWs+FMpbnCi1VF6hvo8xd1Es0W/9fkxGAOkY+7Dqt2pyQ6tY34a+rcFuiDObMCxoI6OU7OD/u29BfgZs1/yLlm1Y1E1PGdPP+3W7WxJ6e1XD4S4CZhTk59JJPly0ZEHJvyiuNALDM4oxvt82pElGwKv+h1sHCvIJgG3Z0NXAutsh+p061QZip1L/fnh+DvCEPWvZMDr4Lm6AfAluDFP8/++tTE2BhwPNsLRLWxO1+7NkPysM/GIkmYO3vN0NrxtJc21H/GoHuAxKG2/8wUP3FvC5gulYmyv3tAAHFNF5jc8BQva1jDTDvy7GQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(70206006)(70586007)(86362001)(82310400004)(4326008)(2906002)(6862004)(55016003)(54906003)(356005)(5660300002)(37006003)(508600001)(83380400001)(81166007)(316002)(4744005)(1076003)(8886007)(186003)(7696005)(33656002)(26005)(2616005)(956004)(6666004)(336012)(36860700001)(36756003)(8676002)(8936002)(44832011)(47076005)(6636002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Dec 2021 10:04:47.2926 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a6739a03-35a9-49ae-d388-08d9b4b20190 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5677 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2021 10:04:52 -0000 The 10/14/2021 15:53, Wilco Dijkstra via Libc-alpha wrote: > Hi Naohiro, > > This is v2 of the A64FX memcpy - in the end I decided on a complete rewrite. > Performance is improved by streamlining the code, aligning to vector size in > large copies and using a single unrolled loop for all sizes. The codesize for > memcpy and memmove goes down from 1796 bytes to 868 bytes (only 70% > larger than memcpy_advsimd.S). > > Performance is better in all cases: bench-memcpy-random is 2.3% faster overall, > bench-memcpy-large is 33% faster for large sizes, bench-memcpy-walk is 25% > faster for small sizes and 20% for the largest sizes. The geomean of all tests in > bench-memcpy is 5.1% better, and total time is reduced by 4%. > > Passes GLIBC regress, OK for commit? i am waiting for confirmation if this is what we want for a64fx.