From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80043.outbound.protection.outlook.com [40.107.8.43]) by sourceware.org (Postfix) with ESMTPS id CD9A73945C26 for ; Thu, 29 Apr 2021 15:14:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org CD9A73945C26 Received: from DB6PR0501CA0047.eurprd05.prod.outlook.com (2603:10a6:4:67::33) by AM6PR08MB4690.eurprd08.prod.outlook.com (2603:10a6:20b:cd::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4087.25; Thu, 29 Apr 2021 15:13:59 +0000 Received: from DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:67:cafe::5) by DB6PR0501CA0047.outlook.office365.com (2603:10a6:4:67::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4087.27 via Frontend Transport; Thu, 29 Apr 2021 15:13:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT019.mail.protection.outlook.com (10.152.20.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4087.27 via Frontend Transport; Thu, 29 Apr 2021 15:13:59 +0000 Received: ("Tessian outbound 8ca198b738d3:v91"); Thu, 29 Apr 2021 15:13:59 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: e72168617feef07f X-CR-MTA-TID: 64aa7808 Received: from 54cc145b2f92.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 73D15A9D-2444-4F9D-AC70-554FB8CF9B69.1; Thu, 29 Apr 2021 15:13:52 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 54cc145b2f92.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 29 Apr 2021 15:13:52 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BjJZ836HK022uZbEGeyhvUBSp4uCXkyY6u6RSjdEecXCPadXOEeKpOzqN73FRNhMGha25nb6CgVghnAvRJdPJYQh90KrZj2ViPNVWC7FJ1OpGr+0b7bprK7LxrXphM/LLTpG+aJMJOZcqMWFPyachM+5m7Nz6B9Cn1VASFEOoI6bPJhcP53iheo1XiZfwGcppcwU+3Jm7dFMbIb3TpRcLeSEK+ppmkp/uaznEL9pOtliTEyH4eLMAuRU2G20s0eibjc9re4nncTxXy0C1z5mCns1KS3GkEYJNYezMPkgFLuD5rVVN82N/T92pTFU/FEc3ZV+5INs285RaIQKRU4GEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HWHxHog9jGHu4swgVFOxuhU8lnL2sMEjp7I79OmoUVA=; b=cYC2FzfwhyoPW78tEOV9km2pZ2kQ0xGyEaqHPaOSL8UyZdcUW6AenjrCWgXX2NKq3j3koYzWSrAWg4BA5g4jx1GiMTUruLOKm3B5kTe/ChkLo30FQt8krUpQA9rWUG6BKHBD8C97QaZ3kTiIMcyH0tz/7Ftf61bLX5SqXrnvWSseICCSmypSH3EDsnizgusCR7KXQ2bCj+U+XiieT3Ub/neXA7iqSmA1xvNtyZA2vAzmPRM1DtKZ5c11qCeXcxu0fqzaHvGZcWo039WVqA1bwRj7egt5yzdedweaskeJq1o4yoC7+5xxG1zbFTQDPjBdFnhfLlA2RCfT4hoeMdi6eQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB4223.eurprd08.prod.outlook.com (2603:10a6:803:b5::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4065.27; Thu, 29 Apr 2021 15:13:51 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::21c8:9d55:dccc:8be9]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::21c8:9d55:dccc:8be9%7]) with mapi id 15.20.4087.025; Thu, 29 Apr 2021 15:13:51 +0000 From: Wilco Dijkstra To: "naohirot@fujitsu.com" CC: 'GNU C Library' , Szabolcs Nagy Subject: Re: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Topic: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Index: AQHXL5Jyw0P1gKwhEk6/DkVDv1IPJaqyCeTQgAIMP+uAARdq8IAIay/kgAqyspCAA1jovA== Date: Thu, 29 Apr 2021 15:13:51 +0000 Message-ID: References: , , , In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.249.100] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 1263b8f5-26c5-4916-1c25-08d90b216a48 x-ms-traffictypediagnostic: VI1PR08MB4223:|AM6PR08MB4690: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: CuJI/u2Cax/OzjmsKXBNNzt+XQhiAHvPOvEayGXIyj1o+aA9rp/yaa+yvl8Oec0LFe8RDijebRzChKQ13IVxdrvXPAWNXHG8QCbZpJyJSQS8+WI8Y3MGdrMD3u8ngawZJ3YsS4LPc2P/RpIM0RHTvjJUxhDqCu5iJSsNAjfkC4Lp59LPIZ57GmWEoBn6Ps3MXCBnhq5PmdnGXL/nfJNJB6a25YYv+lbQ37hFuRovoL7SAW3m1EnUjqz1Cc3Acl9yRIAkOQAiMuGA5K8WO1mFp5F8xVmqOBGncGBzkyIVp+Zp98x2EizjjlqjDC3i1FbXYuVkwvcg8UboKk9fHbTsMJMH+HVqPNAW8ODX0K81kQW8DpW3oa6pHeaDmTIdzG5+dwSOhF1y42VQk9LTgTjrg1xoCLm30qeHjbPVOW4NOxXeCp1K4+nIeaLubg/UxGheUTQYyIk9sIQvcXuJWMfw7xMXBbcz8TXkwvWEQKMvIM47kDkRm8Esobd1L4g+lBpKpWFGcinpMEP5ZTA0bXRc6/6aSTAUhKZm92ZvjJac5Sl6sH8Ht7WN42K1fOS3JY6lHqEgj96VzC7l8x3leRsZpHI4hpwkB5glenU7eNxcDY1tQ4XYZvAIB1EpfVhsN+K91FCAeOwXYu5RhQJqHb5Ya+XgqiaPfgqQa/DMkMfVeQ6qst0yMCCuj0atcFgTVfx3 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(376002)(39860400002)(136003)(396003)(366004)(8936002)(38100700002)(316002)(54906003)(5660300002)(6506007)(83380400001)(33656002)(7696005)(9686003)(2906002)(26005)(186003)(55016002)(52536014)(66946007)(478600001)(71200400001)(6916009)(66476007)(66556008)(64756008)(66446008)(122000001)(8676002)(966005)(86362001)(76116006)(4326008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?Q?zQDXMm0/L6i9KYPt+c4Jd7LEziP6lMsAyjClQsnzN+j9E1V8H4BN81D2VS?= =?iso-8859-1?Q?QS3DBlg1MjgO01xB8nqbNPQX9yVDVzg6tKEg+WaGz4RYfQrqHDGpEhIaBw?= =?iso-8859-1?Q?GMb9O35V8UJBSZpzdQZdE5MHFNr1hZPKE8vNWRNJ1UUheQ+xExJyLqHxHM?= =?iso-8859-1?Q?YzO6imZxMi2jMIjM7om5aLISGI6Wrg9yk7Nb6DkQWUQmeAittoeWaRoImg?= =?iso-8859-1?Q?nbcCmR5+VGn3gQLeQdUtvNpEBEF37+tM9KXq/1tnL7V1f/U1O7HLvX8raD?= =?iso-8859-1?Q?DocBXOuJFAR9QIa4BzOQ/FLa+AOhNUSWQEGlL6B16dSba8yw5UB/pTz9eB?= =?iso-8859-1?Q?6bVROEqypZ175zWfNtzKt4A72Gz0Wa3O9Wl8kZFY1tvCXdIfFz8jg/fPGU?= =?iso-8859-1?Q?ghv335QKgRUzzzyeOdxVKcvBKKjUlXksSZPwJ2zOLor5ot0E/7fAUoyxdI?= =?iso-8859-1?Q?fWsYUgNgrXQFuOtSrcf/kzaA6HBmdPLveywR2vYLh9qDLHWbB/ZUu+sS+O?= =?iso-8859-1?Q?PjnNvBg/soXHmnVC9pyjJelrCq5BrtURDll5DSdhLc77joLfIZ2nTW+05r?= =?iso-8859-1?Q?W270aQ5DxT0PObbNuxENAiktPDFV/2pCO4YerutGlRoluSlelxw6kQHsv1?= =?iso-8859-1?Q?w4fLbGUhCS1KLvEQRgxY0V6ozeA+UCpRPsKS/zYCSM77HcAfuk/eDpeHwS?= =?iso-8859-1?Q?QgsZk8jUtGjzWhV/qPkCKcQ+39lKg62XhTeHGyfhGWH5SXbthMhUAVAnja?= =?iso-8859-1?Q?CJFM1dmMDWwi3l8/EBWjamZ+gddCD35i/zCdw212T4TtupqIVAXB/JSgdD?= =?iso-8859-1?Q?/wXn2DE/nyswYrLeU8qA3tHVzLIM4nhH3fweFbB7whQfKqGGhFXzzjeyGK?= =?iso-8859-1?Q?NomuA4TzA5nC9fJUvQ34PpfWzx6Ut3f690zVjSpYrlJ1ltLtqwfXeMDgE0?= =?iso-8859-1?Q?dBDpBzs4yn4eKvHfaGsh67TRUXmikoXbSWIPq2z9zypbh/5x7aILw2HONg?= =?iso-8859-1?Q?fk3zRKqajr7wAZu3SSEDuEKjJEBTwtkKMQXKUV/11ReJYM1bNE3N/Pv7wz?= =?iso-8859-1?Q?NRAFmTVqVGbSrQIfaRqdnA8gmlvtrNuNbipEnrtG/PCUNibXIN4ZaoRaIz?= =?iso-8859-1?Q?U8+yDChLBqFLBn5qt23oX37GKDj5MR7Jk2cydUraRJMOUZFCxqSyBVVkIr?= =?iso-8859-1?Q?mj5tr4R95n4D32J/XSF7ZVUSbpOsjPmdYccSpKaxaONgKHx85xP0BoIZX0?= =?iso-8859-1?Q?m/2w0VL0v49rGbaCarOeJuDH7OE3fjlnsW7V8mlqcSM4VLEsWZ6hRkxU21?= =?iso-8859-1?Q?+9/C0cKl1K7cPRq+fhnB/JxeYTBZF1jMghzn/nVdAkYH6IE=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4223 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 96832de7-0237-4e7b-0c7a-08d90b21656b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: JagGGC7dYb0ROWH+Fq1qOnXUp6iPtrEEIyfg768RR/gPkgOc4Mffk7GODZtCs1fJaAmgPtc5suC7KsHE0FRtxT57tiyPiGIpFqOyBcopnhi5v7kwlZWOI3RCaY+Z2/Ofl8WkB0AyABCpQZnAiVbCsz7/POFCzrvKePAeuW3s8zJCpEhRISah2sMJ3lGX4W/kNvk1aaXRTetSs7BXxyF1L7E+Tys4jLXwAzIKXdS1Y/PcsSYsedeiCIesRsCqp+VosgcmH/V7sNssZ6ftvRuS4fX38X0lq88/o8ZDCs9MPJFHGCP6LVa4RuNmny9BB1Pn+GaCQkKovuUdBQZCvhduKx0pTkGBxhCK+L8+24NOcCmeDs3fChwC97oACW+bWpkKq6mFCA1ydHleZ5ZlVhwxyOa9ipGQo5oyPUToDWYe9BaWhvaQayPhInjit87VXYoHvMzUoNV6C+ARwmK0mGHkxBYujmdmk5rUmDhp8PuYq3Zw26moqXTejE0o271MbGbLP4VygwnH+qwIDXZZOqSspBY7kwMH9EjAgkmZy3Vp7x/HbBIf4xMdH/ko35V4+dKp+F/kQNSYWBAgrEry43pxtS5GVqsxIdUm0AO/19vFDyUZQpnT7K3QwwMZBB+603W1HC38u7eHNW5MclRP1kqN1lz935xiwxn1zgTVBKH9KkwaOre0eKGAmZC0Mth2/C49t0W/DCSDQNv33D8MDWsxoQtbB1brLKA4l/HVGgyXArwZukNNIFgyxum3A6fOPCmSzPSXsdqJ5pZAegNuLMDQHA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(39830400003)(396003)(136003)(36840700001)(46966006)(70586007)(336012)(70206006)(33656002)(316002)(26005)(52536014)(47076005)(36860700001)(6506007)(6862004)(5660300002)(86362001)(83380400001)(9686003)(186003)(55016002)(8936002)(4326008)(478600001)(2906002)(8676002)(82310400003)(7696005)(34020700004)(356005)(966005)(54906003)(81166007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Apr 2021 15:13:59.4512 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1263b8f5-26c5-4916-1c25-08d90b216a48 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB4690 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Apr 2021 15:14:04 -0000 Hi Naohiro,=0A= =0A= > I believe that I've answered all of your comments so far.=0A= > Please let me know if I missed something.=0A= > If there is no further comments to the first version of this patch,=0A= > I'd like to proceed with the preparation of the second version after=0A= > the consecutive National holidays, Apr. 29th - May. 5th, in Japan.=0A= =0A= I've only looked at memcpy so far. My comments on memcpy:=0A= =0A= (1) Improve the tail code in unroll4/2/1/last to do the reverse of=0A= shortcut_for_small_size - basically there is no need for loops or lots = of branches.=0A= =0A= (2) Rather than start with L2, check for n > L2_SIZE && vector_length =3D= =3D 64 and=0A= start with the vl_agnostic case. Copies > L2_SIZE will be very rare so = it's best to=0A= handle the common case first.=0A= =0A= (3) The alignment code can be significantly simplified. Why not just proces= s=0A= 4 vectors unconditionally and then align the pointers? That avoids all = the=0A= complex code and is much faster.=0A= =0A= (4) Is there a benefit of aligning src or dst to vector size in the vl_agno= stic case?=0A= If so, it would be easy to align to a vector first and then if n > L2_S= IZE do the=0A= remaining 3 vectors to align to a full cacheline.=0A= =0A= (5) I'm not sure I understand the reason for src_notag/dest_notag. However = if=0A= you want to ignore tags, just change the mov src_ptr, src into AND that= =0A= clears the tag. There is no reason to both clear the tag and also keep = the=0A= original pointer and tag.=0A= =0A= For memmove I would suggest to merge it with memcpy to save ~100 instructio= ns.=0A= I don't understand the complexity of the L(dispatch) code - you just need a= simple=0A= 3-instruction overlap check that branches to bwd_unroll8.=0A= =0A= I haven't looked at memset, but pretty much all the improvements apply ther= e too.=0A= =0A= >> I think the best option for now is to change BTI_C into NOP if AARCH64_H= AVE_BTI=0A= >> is not set. This avoids creating alignment issues in existing code (whic= h is written=0A= >> to assume the hint is present) and works for all string functions.=0A= >=0A= > I updated sysdeps/aarch64/sysdep.h following your advice [1].=0A= > =0A= > [1] https://github.com/NaohiroTamura/glibc/commit/c582917071e76cfed84fafb= 0c82cb70339294386=0A= =0A= I meant using an actual NOP in the #else case so that existing string funct= ions=0A= won't change. Also note the #defines in the #if and #else need to be indent= ed.=0A= =0A= Cheers,=0A= Wilco=