From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa12.fujitsucc.c3s2.iphmx.com (esa12.fujitsucc.c3s2.iphmx.com [216.71.156.125]) by sourceware.org (Postfix) with ESMTPS id 7B5153A5300F for ; Thu, 6 May 2021 10:01:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7B5153A5300F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=fujitsu.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=naohirot@fujitsu.com IronPort-SDR: yDWQLxJWUY5qX8rWi+tV+CAfuQkxDEvE6qNC/LAP55l2EORuUc92hd8gmpH8v2NRtDJwUiq3zj 8M/qeYo6ZzK22aCYCzUho84L4kNXOyLcku8h6eSwUE51DY03QuG/1l7Gn8Bf/BQKOP83+pFoO4 QLqGcvSaZ2h0KeSL4xYqNCTmf1+wSn1JRADFtcH8G+bAN//9uklUv1oAUZzTWj0JAe3cKfApTk GXK3AtKrY4FCUDUnkiMMrcV5jmhlfHjFSbNmYPQ1Vvkx59yq1JqovLvQCAy4A7H27mjRzP0T1Y heo= X-IronPort-AV: E=McAfee;i="6200,9189,9975"; a="30929377" X-IronPort-AV: E=Sophos;i="5.82,277,1613401200"; d="scan'208";a="30929377" Received: from mail-os2jpn01lp2056.outbound.protection.outlook.com (HELO JPN01-OS2-obe.outbound.protection.outlook.com) ([104.47.92.56]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2021 19:01:18 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Dw6RuYwmugK8GRbn9v8wpuXTqLzZIxXm9vdRBZ59bwM0okU/msiFbjAvUGwAFJad/mmhzCKE3peNfoDmsrF6S3V4bqGV5w0Q/b2HahxOw4YQuzR91HjecLpvSIdrXs+avHcKaAF4rHVIpgJNuXqpf3GHM+3tL1cVgB6z5k2FY2/PpxWRTmlhoL3GtyA8/1M8S7pEZkh2oB5yhAGKSMlsHDrCxcEAGY+SlwFg5AklSLTZRxJlEo93hqCN9ajrcyK6UgXXGDuMT+ReN/GqoF/6hKVIX3BwKbwkicaVOe9RZPO7LoNQ5ZN15n63KP2+zdYRpcZlggQsEQA5pjzhGzeUBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MRhTd+uWOFzKYY+PIcSD5uD8+uMBOvnx4ay57uKZk30=; b=H662MrIKvSaAf775n/boz7Ndo14xiP2W6Sbh4WAmK0E+MH5CFqJCtPa6JugShiQPZi9nJ4PbEcEb3YxyY5QBI5Xi8ISXECGAcu7iJfTv3c8EA/2KQlh4RCIuDM468tQxY3+38SB34nISsIAh7615ck9lpTPpPJn/n3QLEDdlBnecsjB7vvM1Vr7xCY2K1Ta+NoiXEW2bQumdhN0c4R+KpFUjNiVa56c92jtKN6VEdF+2o1bm9K0pLl4tt2J2FkZtLiGtLiMOUk6e1AuKSeNutdo/deMhkbMyghFLLN4+rmfqZWKQI7/kuvx0nQv8MVr52tVg/GJX2H4ugLC32rltHA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fujitsu.onmicrosoft.com; s=selector2-fujitsu-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MRhTd+uWOFzKYY+PIcSD5uD8+uMBOvnx4ay57uKZk30=; b=Vj+fIVPL37Qt8qhEm7jLRQU6EUCEZg/gjrWY1HpeWhYozy2z09kOKrEm7Kl6sc6nJWdhZvKAQn0OenSPdyckYkgoX/Cu42o5kRF7q9iMEmiLOYHNd+JkZE4Vfv07tZPmiEV1/cQ2QSJFTgFtci1BoSrWiHcD7XfEiOPTslBu3Jo= Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TYCPR01MB6190.jpnprd01.prod.outlook.com (2603:1096:400:4f::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4108.25; Thu, 6 May 2021 10:01:16 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::2422:2c7:39a3:5283]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::2422:2c7:39a3:5283%6]) with mapi id 15.20.4108.026; Thu, 6 May 2021 10:01:16 +0000 From: "naohirot@fujitsu.com" To: 'Wilco Dijkstra' CC: 'GNU C Library' , Szabolcs Nagy Subject: RE: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Topic: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Index: AQHXL5Jyw0P1gKwhEk6/DkVDv1IPJaqyCeTQgAIMP+uAARdq8IAIay/kgAqyspCAA1jovIAKD4bw Date: Thu, 6 May 2021 10:01:15 +0000 Message-ID: References: , , , In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-securitypolicycheck: OK by SHieldMailChecker v2.6.3 x-shieldmailcheckermailid: d7cf38cd6fb44023a06ba7089373679d authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=fujitsu.com; x-originating-ip: [218.44.52.178] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: bf1608f4-3439-429c-9b2a-08d91075e348 x-ms-traffictypediagnostic: TYCPR01MB6190: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ci2l1vQMdO+eOCRScWLJ8D3FdSPGbTPNZuxQ1QmncRWdTzTD2EaC7smlSsfwaoFvIr7TV5EHGWlxxseAxSS7RzxAYde3hbHp7Aa6NqnARDimO9/C5v5gto3tUh/yJHEu9N4XNxpWqNyB8+MfKksIEXOioDJSpHHNReaiqpWz/ocRvJeVhV9XmhBeQxVMKQUFr6rJ3DHIrPpma2z6W4ZtLoZObvh6dUxmUDYC+ll9jfZDCrL97XpdMPJYgAErjIYJwVJGkDv/nRufg5K28RbAOiYlOf0sU+0o0YqhDJ+ajYXL4WOdHhdaDVseTkB/10FcfIdxQfYc5Q3XP5J4vDKdVenxl/Ch/4pM0N/eaHoKj7JOjORRV3cqqIgDY0RFNvXan90rOb2i0NLNBN9X7rBZsCMtPVbvylbE4fcnk1+hNxnzrpPYPyaQ1AUlBuMcVFCE9MPm0XbE8k6RsX1ozgK+6DwPxzFYUcNYrdZgJmsgSVAMrVpcRV2/lFcbqC+Z1TyJ9W9aNtSrZfo/lHcaIn3k9U+2vFCP4nSe0Gz33ryi31AoH/eJH+3JuX/KieJC8kvKOJnqNAbYKq04FP84Z5FxXD6exVY6Ij0l68/ngfRSiLv7uwuxMrzg1VUCilXCXe6SDz+Muj8r5sprB2uBoSz8j3Axw/UCVaBsmSkWtGTKfAA= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(376002)(136003)(346002)(396003)(366004)(8676002)(66446008)(2906002)(316002)(71200400001)(4326008)(5660300002)(8936002)(66946007)(52536014)(6506007)(55016002)(7696005)(66556008)(64756008)(76116006)(83380400001)(66476007)(478600001)(85182001)(86362001)(26005)(122000001)(186003)(38100700002)(9686003)(6916009)(54906003)(33656002)(966005); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-2022-jp?B?YmttV2lmVWJ3MUNoRit0QmdnYk9ZQ0p2NW1RbU5xM1p4T3VBR3FWNlFq?= =?iso-2022-jp?B?N2Q0bEJNRjlqM04rVXNXYWVZa1AzK2d6Z3BVRVh4ZFRsek14TmRlcWF0?= =?iso-2022-jp?B?U0x0eGgzaWlGQmpXUDhNQ2k1NUdVMW5CWTJCcXkyWFd3c2NDSDFGb2lL?= =?iso-2022-jp?B?WUdEZzdPYkJUb2hwVGI5bnQ2dWFpemN0VHdNcDZtL2k1dHdrNjJsZkVi?= =?iso-2022-jp?B?Y0xTbWNuUFRBckdocExSeml2VDU3NXRpUDJJeiszbHFUWlMxSFFKKzFL?= =?iso-2022-jp?B?U3J4dm1HOHM5Ny9XajlibktJZTkvRXNPc1A1dmxmNHNBWkRENGg2NDdo?= =?iso-2022-jp?B?RXE2SEdDcytCVW1yZGh5cWd3VEl1REdkOENNZDlrQXhlOUJwRTVUdC9N?= =?iso-2022-jp?B?ZUFiTWcxTFBMdCtJTllJV1JjbHBqdmw0dkVJa1k4ZDRTMlVOcytzTTdz?= =?iso-2022-jp?B?NTJwSlZMWHJEbm5oeVZIVTJwbjFxa3FVb0JBNTBEVnNRQjJKUHI0ZTlF?= =?iso-2022-jp?B?QXpRcE9wa3ZxRTAzQU5taWR5M1JMcXdPQWNCYm1KU2RMTUpDU3RMenR3?= =?iso-2022-jp?B?Z1N1Rm03ckNPdXJHN2NJazl4VkhhVjNoOW5hWDcwYlpyZitCWnJqZHRS?= =?iso-2022-jp?B?ZjRjUGNTMktheVFjSFRvUUVZaml0TkNrNEJ5OU0vYUJPNjZ1bGcrTzNW?= =?iso-2022-jp?B?OXZzNDgwNUhoYWVoNVNpOU1CSGQ0dU1RVkFtVTlrUmdjb0Z6dWI3RmE5?= =?iso-2022-jp?B?VVk3RHNWRnZWdE1mUWhIckorOEY2QlhDSTBOaENBUmhLS2w5a2gzbVMw?= =?iso-2022-jp?B?Uko0ZlNva3VLV1U3b3VCSTgxY0dvRU4yb1YvMytUQklZVFEyWEJKN0t2?= =?iso-2022-jp?B?N1kySHdhdkZoL2R3R1NnN1FQaklGL0hRamR3NWJVbzJPRGVKSkcwK0Ns?= =?iso-2022-jp?B?OENQRjNhTndNdGdZNFNyZDhzYVo2UlE0U2NDZEZrTEJnVkpOakl2Si9a?= =?iso-2022-jp?B?YTZxSEZwQ0hPMTZEY0ZheWExdy9FUmpOK3RBL0FGc2praUNXL3dISC9P?= =?iso-2022-jp?B?WEJyVFRKZFA0QXd4c3V2cSs2ZXRsSFMvYjk0ZlAwSE1iYmt6bEtsTFI3?= =?iso-2022-jp?B?ZlhLa1QrbnYzeVVISkwzdUtyUXBTcllMd3JMT2xlMkxSQkF3WlcwZTFv?= =?iso-2022-jp?B?M0R1QjAwdG51djVDRjIvMkhDUVl1Y0t5dWR6dHFzNDM1Tlc0VTcvM1Bi?= =?iso-2022-jp?B?V2VZUFZLcXM3L0Y5NlEzWGRpUTRXdjdMaHVFVWsyL3hlUWJPak5kMzQr?= =?iso-2022-jp?B?UGR5TVpvbXJsb01xRW04N1BsZFBvdkRPR3NINXZJamxXWjZaOHNVbjcv?= =?iso-2022-jp?B?TXV6QzRIOUtCaE1aS09xWHJiMkw2dk5TS1EwQWlqV0NRdmdoc3hyYmZ0?= =?iso-2022-jp?B?MW43dTBqOHNyME1xalBkVEtwYTBtYkJrcFlvL2NuL3NJQmdvYktCVTFp?= =?iso-2022-jp?B?ZnNxU3MxVURqSlhhdUt3YTVpVEtvekdZQ284OTdnNG5sVmxpdk1nTER5?= =?iso-2022-jp?B?dlFyT01KTERPNHpVdjdodkVwci9QSmQvOG1jcUJwdkVYMFpnY0piY0FC?= =?iso-2022-jp?B?cGNlV2Q1U3J2ZWk1Y3podDkvVGdGczRlWlBoc3Y4MTlua0V1dkx0ZEVJ?= =?iso-2022-jp?B?SmEzaUpIbmhNTHJHWDY1d2RLelROb2F5cEYrY21wd3RPMjlMTjlkMHo4?= =?iso-2022-jp?B?QVVUbVI1TXA3Uld6S2QrWGppS1JFNmp4K1hydVNIeUV6Qk5OV2RuS3Aw?= =?iso-2022-jp?B?SjRFV0tzdndMdlFrc3FZK1pkVFUwY0NqT2ZFUlBHVkt3aVZSSTZrVnEr?= =?iso-2022-jp?B?bjJDOEIvbnFkWndIZEN6aHpPditzPQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: bf1608f4-3439-429c-9b2a-08d91075e348 X-MS-Exchange-CrossTenant-originalarrivaltime: 06 May 2021 10:01:15.9074 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: L5mxOSG83IyXMY0Y+cxHQJdYaaML5fISYGlDbH/XRYp1aq2eEaolzeEScncApUoNABpS5h1dimXM9EsFaC+0hg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYCPR01MB6190 X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, KAM_LOTSOFHASH, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2021 10:01:24 -0000 Hi Wilco, Thanks for the comments, I applied all of your comments to both memcpy/memmove and memset except (3) alignment code for memset. The latest code became memcpy/memove [1] and memset [2] in the patch-20210317 [3] branch by evaluating the performance data as shown below. [1] https://github.com/NaohiroTamura/glibc/blob/d2ea23703fc45cbfe4a8f27c759= b0b23722e17a4/sysdeps/aarch64/multiarch/memcpy_a64fx.S [2] https://github.com/NaohiroTamura/glibc/blob/d2ea23703fc45cbfe4a8f27c759= b0b23722e17a4/sysdeps/aarch64/multiarch/memset_a64fx.S [3] https://github.com/NaohiroTamura/glibc/commits/patch-20210317 > From: Wilco Dijkstra > I've only looked at memcpy so far. My comments on memcpy: >=20 > (1) Improve the tail code in unroll4/2/1/last to do the reverse of > shortcut_for_small_size - basically there is no need for loops or lot= s of > branches. > I updated the tail code both memcpy/memmove [4] and memset [5], and replaced small size code of memset [5]. The performance is shown as "whilelo" in Google Sheet Graph for memcpy/memmove [6] and memset [7]. [4] https://github.com/NaohiroTamura/glibc/commit/f7d9d7b22814affdd89cf2919= 05b9c6601e2031d [5] https://github.com/NaohiroTamura/glibc/commit/b79d6731f800a56be66c895c0= 35b791ca5176bbb [6] https://docs.google.com/spreadsheets/d/1Rh-bwF6dpWqoOCbL2epogUPn4I2Emd0= NiFgoEOPaujM/edit [7] https://docs.google.com/spreadsheets/d/1TS0qFhyR_06OyqaRHYAdCKxwvRz7f1T= 8jI7Pu6x2GIk/edit > (2) Rather than start with L2, check for n > L2_SIZE && vector_length =3D= =3D 64 and > start with the vl_agnostic case. Copies > L2_SIZE will be very rare s= o it's best > to > handle the common case first. >=20 I changed the order both both memcpy/memmove [8] and memset [9]. The performance is shown as "agnostic1st" in Google Sheet Graph for memcpy/memmove [6] and memset [7]. [8] https://github.com/NaohiroTamura/glibc/commit/c0d7e39aa4aefe3d7b7d2a8a7= c220150a0eb78fe [9] https://github.com/NaohiroTamura/glibc/commit/d2ea23703fc45cbfe4a8f27c7= 59b0b23722e17a4 > (3) The alignment code can be significantly simplified. Why not just proc= ess > 4 vectors unconditionally and then align the pointers? That avoids al= l the > complex code and is much faster. >=20 In terms of memcpy/memmove, I tried 4 patterns, "simplifiedL2algin"[10],=20 " simplifiedL2algin2"[11], "agnosticVLalign"[12], and "noalign"[13] as show= n in Google Sheet Graph [14]. "simplifiedL2algin"[10] simplified to 4 whilelo, " simplifiedL2algin2"[11] = simplified to 2 whilelo or 4 whilelo, "agnosticVLalign"[12] added alignment code to L(= vl_agnostic), and "noalign"[13] removed all alignments. "agnosticVLalign"[12] and "noalign"[13] didn't improve the performance, so = these commits are kept in the patch-20210317-memcpy-alignment branch [15] [10] https://github.com/NaohiroTamura/glibc/commit/dd4ede78ec4d74e61a4dc316= 6fc8586168c4e410 [11] https://github.com/NaohiroTamura/glibc/commit/dd246ff01d59e4e91d10261c= d070baae07c0093e [12] https://github.com/NaohiroTamura/glibc/commit/35b8057d91024bf41595d38d= 94b2c3c76bdfd6b0 [13] https://github.com/NaohiroTamura/glibc/commit/b1f16f3e738152a5c0f34412= 01058b48901b4910 [14] https://docs.google.com/spreadsheets/d/1REBslxd56kMDMiXHAtRkBn4IaUO7AV= mgvGldJl5qc58/edit [15] https://github.com/NaohiroTamura/glibc/commits/patch-20210317-memcpy-a= lignment In terms of memset, I tried 4 patterns too, " VL/CL-align "[16], "CL-align"= [17], "CL-align2"[18] and "noalign"[19] as shown in Google Sheet Graph [20]. " VL/CL-align "[16] simplified to 1 whilelo for VL and 3 whilelo for CL, "CL-align"[17] simplified to 4 whilelo, "CL-align2"[18] simplified to 2 whi= lelo or 4 whilelo, and "noalign"[19] removed all alignments. As shown in Google Sheet Graph [20] all of 4 patters didn't improve the performance, so these commits are kept in the patch-20210317-memset-alignment branch [21] [16] https://github.com/NaohiroTamura/glibc/commit/2405b67a6bb8b380476967e1= 50b35f10e0f25fe3 [17 https://github.com/NaohiroTamura/glibc/commit/a01a8ef08f3b53a691502538d= abce3d5941790ff [18] https://github.com/NaohiroTamura/glibc/commit/c8eb4467acbc97890a4f76f7= 16a88d2dd901e083 [19] https://github.com/NaohiroTamura/glibc/commit/01ff56a9e558d650b09b0053= adbc3215d269d65f [20] https://docs.google.com/spreadsheets/d/1qT0ZkbrrL3fpEyfdjr23cbtanNyPFK= N8xDo6E9Mb_YQ/edit [21] https://github.com/NaohiroTamura/glibc/commits/patch-20210317-memset-a= lginment > (4) Is there a benefit of aligning src or dst to vector size in the vl_ag= nostic case? > If so, it would be easy to align to a vector first and then if n > L2= _SIZE do the > remaining 3 vectors to align to a full cacheline. >=20 As tried in (3), "agnosticVLalign"[12] didn't improve the performance. > (5) I'm not sure I understand the reason for src_notag/dest_notag. Howeve= r if > you want to ignore tags, just change the mov src_ptr, src into AND th= at > clears the tag. There is no reason to both clear the tag and also kee= p the > original pointer and tag. >=20 A64FX has Fujitsu's proprietary enhancement regarding tag address. I removed dest_notag/src_notag macro and simplified L(dispatch) [22] "src" address has to be kept to jump to L(last)[23]. [22] https://github.com/NaohiroTamura/glibc/commit/519244f5058d0aa98634bb54= 4bae3358f0b7b07c [23] https://github.com/NaohiroTamura/glibc/blob/519244f5058d0aa98634bb544b= ae3358f0b7b07c/sysdeps/aarch64/multiarch/memcpy_a64fx.S#L399 > For memmove I would suggest to merge it with memcpy to save ~100 instruct= ions. > I don't understand the complexity of the L(dispatch) code - you just need= a simple > 3-instruction overlap check that branches to bwd_unroll8. >=20 I simplified the he L(dispatch) code to 3 instructions[24] in the commit[23= ].=20 [24] https://github.com/NaohiroTamura/glibc/blob/519244f5058d0aa98634bb544b= ae3358f0b7b07c/sysdeps/aarch64/multiarch/memcpy_a64fx.S#L368-L370 > I haven't looked at memset, but pretty much all the improvements apply th= ere too. So please review the latest memset [2]. > >> I think the best option for now is to change BTI_C into NOP if > >> AARCH64_HAVE_BTI is not set. This avoids creating alignment issues in > >> existing code (which is written to assume the hint is present) and wor= ks for all > string functions. > > > > I updated sysdeps/aarch64/sysdep.h following your advice [1]. > > > > [1] > > https://github.com/NaohiroTamura/glibc/commit/c582917071e76cfed84fafb0 > > c82cb70339294386 >=20 > I meant using an actual NOP in the #else case so that existing string fun= ctions > won't change. Also note the #defines in the #if and #else need to be inde= nted. >=20 I've read the mail thread regarding BTI, but I think I couldn't fully under= stand the problem. BTI seems available from ARMv8.5, and A64FX is ARMv8.2. Even though distro distributed BTI enabled binary, BTI doesn't work on A64F= X. So BTI_J macro can be removed from A64FX IFUNC code at least, because A64FX IFUNC code is executed only on A64FX. Are we discussing the BTI_C code which is not in IFUNC code? Thanks. Naohiro