From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30071.outbound.protection.outlook.com [40.107.3.71]) by sourceware.org (Postfix) with ESMTPS id 788A63857C44 for ; Tue, 20 Apr 2021 16:01:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 788A63857C44 Received: from AM0PR10CA0069.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:208:15::22) by AM0PR08MB3459.eurprd08.prod.outlook.com (2603:10a6:208:e2::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4042.16; Tue, 20 Apr 2021 16:01:03 +0000 Received: from AM5EUR03FT042.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:15:cafe::9f) by AM0PR10CA0069.outlook.office365.com (2603:10a6:208:15::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4065.20 via Frontend Transport; Tue, 20 Apr 2021 16:01:03 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT042.mail.protection.outlook.com (10.152.17.168) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4042.16 via Frontend Transport; Tue, 20 Apr 2021 16:01:03 +0000 Received: ("Tessian outbound 81a4524e9a48:v90"); Tue, 20 Apr 2021 16:01:03 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2517c107aa7204b2 X-CR-MTA-TID: 64aa7808 Received: from b7ab4bf2550a.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 3885A490-D181-431F-AE07-EFCA2875B654.1; Tue, 20 Apr 2021 16:00:34 +0000 Received: from EUR03-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b7ab4bf2550a.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 20 Apr 2021 16:00:34 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mRs+8X9gWDlanTwI44wZojqUg9mIAlWXvMNVpg1PNko26b0mFZ6fLy3WkQo4ANJURny2+u61BHTpD80bBnfw5N/kZmqr4zepLQ0jjqCtW2vb0kipUk2/mvo4cmKikRfGEi8p4jn8GjPf5W3QEDDYFpyU//MyzVONfz5E3LGDUBfgK62J/f1fTbtyuMRm8jvhrzoaHDNxtQoRqHJK9ji68ddeXMGKBHsC7bbIsIsm4nyyVv8Wn2V5IwW3nrlmmbCsSx321yWllyNitAXBK/4XQ3bP6M/61b7XuYK6NyBaLVUZo3u3Fx1eJMp+ePZYdN3W/53oBGpTsTKLq/ZRR5s6MA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dX6kR9vcoii1hxo+Mr1hNqE9683tC85yncYyWIxZSRY=; b=Kfmt/3fqADYzFvyNeE67YMpeqGHpgXYaJcSLCiBemLrPa4LlgJ6JkCSpf9tqUG2ln0g8YpAAkfHh7o9rnQB8DZP80LPpYh66lovG5jfZrG1B3bQGxhca4SF4L2mCmnyqyQoxiduzd7q47YDbZomeEeyUS8eDRraW3g1VpiCa+UxPBb0SHADzwHodTHVbkPOpw+JVdeYUZBUeQOid/kJrZVxkk3Y6mgm1fgqFI+cX1LVREbCZHBZj6th0QVhyvZUCq166YISGLsTe7zc/qJQTOEmic8QYBqg+DqAgJP6NE8WM7btDiCPYEtBb3D6HyLdF7DkBf+VKT5XoahtFY3+l2A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB5773.eurprd08.prod.outlook.com (2603:10a6:800:1a9::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4042.16; Tue, 20 Apr 2021 16:00:32 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::385c:f8ff:ee16:3a4d]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::385c:f8ff:ee16:3a4d%6]) with mapi id 15.20.4042.024; Tue, 20 Apr 2021 16:00:32 +0000 From: Wilco Dijkstra To: "naohirot@fujitsu.com" CC: 'GNU C Library' , Szabolcs Nagy Subject: Re: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Topic: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Index: AQHXL5Jyw0P1gKwhEk6/DkVDv1IPJaqyCeTQgAIMP+uAARdq8IAIay/k Date: Tue, 20 Apr 2021 16:00:32 +0000 Message-ID: References: , , In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.249.100] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 09162bff-8406-4797-8385-08d904157fad x-ms-traffictypediagnostic: VE1PR08MB5773:|AM0PR08MB3459: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: EnbJio1NbElm9apv44/+K3bb+N7RuweXopD+K6vXdEWjV6ASXn7x1iUPUu+u5NpG8HbCqTdrcVlKuO06VBVBGlChr9zDhe2uHtV9uEWxlMMwt/UeH9U0QDuut2MuoJm32/K9pHR6dMxcDvj1dsMpUdO+Dit038d4I5TjDjypK75zl9IliCSb+/g+1yQD70u0HpIWyz22RR/7yjiqsgQoN9x4KeaqlLimcLp7aVhTaetnW/ygygkcj8kjjnDsUfM4wsj6zHHY3qhugbPyCQpC/DsPMI34VdxLFs7ZO0UVKhe4Lv4ubM3WG4mfqd87c55Bh1kaPZaBrmPI8sdbVut13MafAAM2YwPVBPbUBj9D9STjYC8bRM2lmetoGvwQx8R2Cp+0UEUioP2snrhx/inN+qIWSYGSn9UojCWtPZBHgCG2/fRPPK6wfS2e+coS8587hKwLXTRtaSxqd+OzslDy2CDP+QF8pK1M6GTf33qXwe6dRXyGOOtcfK3Hz/L2MxDHAmiPGDWBGDdOZbSXszpQP+fls56eR4ZyXduLIOLwgBV4/l02LlixKgMm6kmL25JXVIIX9fyIcVYwD132+02cak0cOqs/sntzxNI/87bGm6A= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(376002)(396003)(39850400004)(346002)(136003)(6506007)(66946007)(7696005)(52536014)(316002)(33656002)(2906002)(71200400001)(54906003)(4744005)(64756008)(8676002)(122000001)(9686003)(66556008)(38100700002)(26005)(478600001)(86362001)(55016002)(5660300002)(4326008)(66476007)(6916009)(66446008)(76116006)(8936002)(186003); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?Q?uI/cSX8JbgTI5Dak6S7mCKoXqLcO2BJ++5HwM0Li9Q5fBU9EQg8slslwIQ?= =?iso-8859-1?Q?7sUiUKSvCcuSrI9lLLT1hMBEjjAoqp0SETkkelC2BBQ2FJ6aX0Y8VYGONQ?= =?iso-8859-1?Q?4dljCcEZjWFqDsFFVLsBmTTWm86eGyfbnZR75UjApWhadJXJ9DZUBOFtqK?= =?iso-8859-1?Q?Djbc7tNgT8tE/V+TekiLpxZ9WR2APSnpfCkgFIVcGI2zAzAo0DUoMeYCYQ?= =?iso-8859-1?Q?wghk24L+ZbB6FyZ9Bcmlyo5bwCmnzPbOQSwWwmLxWKdCX3/oCLT2VYKoEp?= =?iso-8859-1?Q?N8hwd89ddN/OSwgji093IhaE6/vNBhGdx4jwa5i6S3hSuVn8p281jSMBz2?= =?iso-8859-1?Q?FtliRknGPNhYcLAQkUCuF1cAen1RPS9Ohr1HVZIrC736UyVk9KrjXNafUy?= =?iso-8859-1?Q?2GzUOGQ+Yl1xsUZa8917OLBqTb9o9DgIuabILdhArgDOH6l1CVdwhxlcmQ?= =?iso-8859-1?Q?0D9OD2M+fjMSFQrlpVeKg+xoSPA6FjIp0Q0+MPWoChvi0lkw2x2eb/FvrX?= =?iso-8859-1?Q?dMpWHaVUQrRtTm59Z5q55HeQ2TZ43gl1GpuLGeNhuS53L68uZy9pAj7Vc1?= =?iso-8859-1?Q?/i6cnbZ1C928u16j43X2O1l3y3Ml+U5fjd8M03ZcVb00FD5VRaL/+3AAXu?= =?iso-8859-1?Q?aUcvDXOCc4fCDxrSmwu0ibtr3Sdq8AOrtxDmGNT2kG9hKTjRU0pQhmZnBf?= =?iso-8859-1?Q?usyHjEQh1zfEaxvopHXAkScNG3MnHn4v3zU37UhBMfG0/Oi4zecF5UzjJv?= =?iso-8859-1?Q?5Wj6R/g9GQNse++6jkqKBloWISsoKVEyR2dgD4wSkS0edXnQkCZ6tvW1Vk?= =?iso-8859-1?Q?ghoRd8kdJSYRGlGluBQyEozBiKLgHfFQoIOdLkj4b6UvxcO97o2m4Z6FQl?= =?iso-8859-1?Q?P64Hil1xuUAR4zVzLtQec8l30xcu91oVZbm3eL5Dw5000wB/PB5RztykKK?= =?iso-8859-1?Q?uSS8mWm1sn2bRrXISKmU9Wlqt7lXFlU7JjsS2Zu5mRQNvlqrnZEeViaE4Z?= =?iso-8859-1?Q?PnVwxnA7H7+bCV/MB+vO2hO+4qEYiq2xXGSzhJYjC2OmVH5o48bAoRc6ma?= =?iso-8859-1?Q?cxRZQDehUTTB/ajqRcpltpeaRRXMLruXhzaX4fDWdV8d04VYZvjKHO6U3R?= =?iso-8859-1?Q?xWzQPjPOO+BxnCNm+OkO7rbIYxSS+a4qyF/NDMfSoh8bLLTjUnfqwQypyV?= =?iso-8859-1?Q?V+/f7OusXrh+cUGXN1XFkdAbVEyrpoIiX3br+Qb7vEkuQ/riKIeKx+VHBl?= =?iso-8859-1?Q?z9d1ZynzPQ5wykP/Kxj8LNI4PhrdC+Zln6wbaNXiDYrdsbjhvLu0V3N5Kr?= =?iso-8859-1?Q?BRpJGOMzeWP/gb1H1TbYaOxrpBCDZBye8l507ko3Ey3fSzM=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5773 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT042.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 4391d881-d760-4e03-9622-08d904156d42 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bFd3SyLi4bXVN+a6HnwvgKCx7HKMNlsbgB4tFa3U7+JAo06Z12LIK0yZiYloNPJVWVI7OKaPb0ajcpm/5W3oQN8uELPxXTMU38WuMgLXOD4tgvWkFG1PSR7t8w4WO2Twgxs1BKBG7rvGFVmeiaQTc4gvMkflgK3C/8ZW3VGtHNaWqA1bNszJ+uJYVIOB/wt/x2e8lnzlf7UB+i0mjioCxfldIdAma0KVaPWCqS8paB7wWLG8ZN0D7uUM45DLsp8umEzrJXbOZq2IHUy7hGR/u7VauGvzBdiNIXBpoX7qhDUi1CxUR9LqHz7wlB+p4rnJcrgJKGOVJZmAUgexll+7r5TSWHWKVRgRL1U2HtgqD3x6aEYs3FkRZZAe38LpG+88EGZzxPyFSM61U2nZEqYtqZUjMMx4vYG6OimeHWvyueSkvjLJz+bcEWJaAqvcoZA5rlqtIcjnfwicdwPW21NqUmzkHkvf/VP+XZgSG+x2RlEuUvmeorbdatPZTWGqtuFA5OZlqABYukgEFYBdLMDwH7hOG/B9f25c5eR0vhoOmUxYHJ2h65tVp2WYR5X1Vj+5bdi/hXVS92YVXKMIsU9hWPKVmUi2VFBXt5HZ6+TEo/HZcppet5uvrut2G/UzryY42FY224TdE891tDRXyLVf/s39nBxLD46KVsQR0rWiF7E= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39850400004)(396003)(136003)(346002)(376002)(46966006)(36840700001)(26005)(5660300002)(82310400003)(86362001)(70206006)(4744005)(336012)(186003)(47076005)(52536014)(8676002)(36860700001)(81166007)(2906002)(4326008)(316002)(6506007)(356005)(55016002)(9686003)(7696005)(54906003)(33656002)(82740400003)(70586007)(6862004)(8936002)(478600001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2021 16:01:03.2008 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 09162bff-8406-4797-8385-08d904157fad X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT042.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB3459 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2021 16:01:09 -0000 Hi Naohiro,=0A= =0A= > Yes, I observed that just " hint #0x22" is inserted.=0A= > The benchtest results show that the A64FX performance of size less than 1= 00B with=0A= > BTI is slower than ASIMD, but without BTI is faster than ASIMD.=0A= > And the A64FX performance of 512B with BTI 4Gbps/sec slower than without = BTI.=0A= =0A= That's unfortunate - it seems like the hint is very slow, maybe even serial= izing...=0A= We can work around if for now in GLIBC, but at some point distros will star= t to insert=0A= BTI instructions by default, and then the performance hit will be bad.=0A= =0A= > So if distinct degradation happens only on A64FX, I'd like to add another= =0A= > ENTRY macro in sysdeps/aarch64/sysdep.h such as:=0A= =0A= I think the best option for now is to change BTI_C into NOP if AARCH64_HAVE= _BTI=0A= is not set. This avoids creating alignment issues in existing code (which i= s written=0A= to assume the hint is present) and works for all string functions.=0A= =0A= Cheers,=0A= Wilco=