From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa8.fujitsucc.c3s2.iphmx.com (esa8.fujitsucc.c3s2.iphmx.com [68.232.159.88]) by sourceware.org (Postfix) with ESMTPS id D076F383B40C for ; Thu, 15 Apr 2021 12:21:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D076F383B40C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=fujitsu.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=naohirot@fujitsu.com IronPort-SDR: tEzBm/etFnRRWpEfwzI4txI7TWX/UWT6oUMex8Qn7iet53LLjTomGweiGUc11bG9jhhTMiUCPE kxkNllwl/AMzZCMgmlMvv8qP7RxTw5zkE1UJkG7sw4slAvuOScCbeXFNYgPp1C6fSGG7jJ2B2q jj5NPxZxRNIHZSy3xa24DCtu2W8eKWhCE6pGoo6KpWhkUHz2tdsJ3S4ZGf73uwgj1xRw6dP7bZ KdXbiqOskOGSwalwt6BouAx6v2G4vrjYynDFABrRT84Jqkl5XOrRTT4drXU3i3kyDIaWBmxbpL OOI= X-IronPort-AV: E=McAfee;i="6200,9189,9954"; a="29827811" X-IronPort-AV: E=Sophos;i="5.82,225,1613401200"; d="scan'208";a="29827811" Received: from mail-os2jpn01lp2052.outbound.protection.outlook.com (HELO JPN01-OS2-obe.outbound.protection.outlook.com) ([104.47.92.52]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2021 21:21:01 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YZr70xtSis2LhkXCmzOw9CHGOn1nB4sEG2Wekcl3024wAs0YvZrLrvUTx3TFeoUjkSS1gstX+aWtM40DRJl8oPol9hEDQviylnwTX00kVjWx5uUpN0vZ5mPdDZ3A9N/QV70LU6yd50kMOhhTJ1TtMWWAkBM0LhruzsUMnOIuiBTzDUE8RaFH+Tj+y/R20iFsAttHozfZzsspDVYGtQbL4XqkP5c0PDOiENeZRkeW+Py4B35I4qj5yhYXeeOPa/JPgD53uuG2GJJ+jLDmSeech7dTJZEH1XdQ9HIjU3xwFZNWhZKK4yIKluk2tguAR5V4nIE5AJwWT5+h8/YoQacLuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bhRLFi4UFjUzqcdK+FikPfry748MV5bFcTvgMLVKi8A=; b=IrPVvB9J5tggxKQUlLlP15DG5/GGM4Z0Hl+Fyw3juKvOUUWa3hYa+MvTGd7WEN8dlpGciXK/4PXHXeA2KQjpLoPXUIghCdtIPrL9us0WlBII0q1Pg895eB7nfY+5JnhdJau5WLfws2yV5qMLbJlLJeY404zxg4beHamZAd69Siua5GPHmdd31o7yMcHt0rFp4+unFSEr+qUDsTQlKIEGJ9A0lkIJl3rTvGSTCrnebH5nyE6qQ5XUDFMuWSW3FT6Gsf+12xkUidScLjdzDt9P3+PINQyQMzm44jLGbqLArnI9+/V5UlC+JWbXrFFz4Dea+/qTmwdL3fx8QqRfNx2MFg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fujitsu.onmicrosoft.com; s=selector2-fujitsu-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bhRLFi4UFjUzqcdK+FikPfry748MV5bFcTvgMLVKi8A=; b=IAqHyIMGulygMxdwCz+M/AtbCW79y34JMIBQ2eSrUP5cjlyiFm+/b5UmV+2IrkUMMxkayzU8OafbXj3lxftFFp0cPsrr9Ng4TpONXskboyhe+Qw2ll0MCNEyUAMLVglmN4QRxhRJDfyLISEgQOBsMO0GPK4wMbWCL+xAIlbcUgc= Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TYAPR01MB3629.jpnprd01.prod.outlook.com (2603:1096:404:c1::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4020.22; Thu, 15 Apr 2021 12:20:58 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::c8de:7917:af16:588b]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::c8de:7917:af16:588b%6]) with mapi id 15.20.4042.018; Thu, 15 Apr 2021 12:20:58 +0000 From: "naohirot@fujitsu.com" To: 'Wilco Dijkstra' CC: 'GNU C Library' , Szabolcs Nagy Subject: RE: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Topic: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Index: AQHXL5Jyw0P1gKwhEk6/DkVDv1IPJaqyCeTQgAIMP+uAARdq8A== Date: Thu, 15 Apr 2021 12:20:58 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-securitypolicycheck: OK by SHieldMailChecker v2.6.3 x-shieldmailcheckermailid: c9329c871671481fa447e98147da5d25 authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=fujitsu.com; x-originating-ip: [210.162.30.53] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 58df33ae-ab39-42fe-bf91-08d90008ecca x-ms-traffictypediagnostic: TYAPR01MB3629: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4714; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Ykz2ym1c3QM1BqSVjPXyJL/V1a5XXxxJyBVSSZ9CXGFdlIZtFGottJnAW5Y+kNE2CvoM9kUhhiCRLkvlXGD6xj72PJPFZcaiU2P0tDofKtPTYTL+OrcTsej9h3Eh29TKPcyqHQsms2fNhpfFNWirs19/p2mR4Y3Y4NcBeHpwaaDGERCwQbJCYcSqXcwDiMMA1YLqnv60TkCo6RS+Zuv9mwpJ0Y1fN8SK+9vAq3dXU/gLV4sRlvX67uvM72pHwIz2u4Q1Z9vS5+i4bcsdWfDwYjOvSkyhEl7JU8PaM2z9p2DE8ESSkdLKu+eWGEqdfqqo9f9syXSvnI+yWLeirVjklN7OSlp5OFHlyM1PureIgBt6Hqote5Rqh5kLbqe+6CAa0lukBYVin8Vv7xd7LWdaK7j49/kh3X9Vn2dV/AFKh1ubn2cQ5kEgB9LPv04fJ7s4aBIG8jMup194uw2A3+L2re8Cfji3drp3gbnUC3gNytlFor8+5kGZduofhK+e9cTJu1XgMAZLIWN2XBBkobwPRoLfvh2RAkCU8Q3TMRCi77fZFyS+9Uef/lcLRs6UYLW66XEbqCqmkXMt2OHc4HONU88c0LLVyaVH2HCXW6uY9nFFWQBANAsx6zBtw1rkZnDVQCANJQ9x4os1IOh8YuQapx9ItOcDsiTELUtSTIKUXMwClcq60ZwJX0GjKmNTcgOP x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(396003)(346002)(366004)(376002)(39860400002)(66556008)(66476007)(66946007)(76116006)(64756008)(66446008)(8936002)(8676002)(6916009)(4326008)(966005)(85182001)(2906002)(478600001)(83380400001)(38100700002)(9686003)(33656002)(86362001)(186003)(122000001)(54906003)(6506007)(71200400001)(316002)(5660300002)(26005)(52536014)(55016002)(7696005); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-2022-jp?B?cWpkSGVUdXhqbVJWU2lTVm9Dc0xWRGd5NC94NWh1bDBMdXZMM3p4aGNB?= =?iso-2022-jp?B?Nm1NdXpIbDlsaVliNWplQmVIcGZxVzAwbFNSVzNuOExVOU9zWHVqanJn?= =?iso-2022-jp?B?Q2lhV2NTOHIrTmU4U3lWbTkvV1R0bTZpZFRZQTQ4SEs1cDFQRmJ2R0hJ?= =?iso-2022-jp?B?cmdQWFR1cnlPZjc0cG9IOVR0VW9VYi9tRGlpMXRLWGlaMEFOemM3REJN?= =?iso-2022-jp?B?NEdXbVJwNHhzdW84bVJjSWpQNlY5b2hkMkxTc3BOeGdiYnB0bitieTlv?= =?iso-2022-jp?B?eEdRZldjQ3BQdkt6MWxmL1gyRlREMTVyYUJpemJmenJoUmhWTW9mUWww?= =?iso-2022-jp?B?UkYxVzEvQVZON2E2WkhuVEV0cE5jYys0ZC9rOHQyZlIzMjZTK0pCUExL?= =?iso-2022-jp?B?TUx1SU1MY0I5WmJvd2t5bGJ3eUJ1RzRMOTc1ZERzckRuN29GN3dkUWVE?= =?iso-2022-jp?B?d3oxZWtGSFFKb3FRR1VJRXZ4czJRZ1RLNWp1MUdRWW1TaWdZUFo1dm5M?= =?iso-2022-jp?B?WlNNUXVoNW56cmJFMzY4dlVnQlA3c2FLSWlQYUR5bWtWR1VZZWtMUDZT?= =?iso-2022-jp?B?OG5LU3MxYnM5MHMrRUE3WGI4ay9zVUUyTi9CRHBweVFtUzVsQnJ5SjIr?= =?iso-2022-jp?B?TURLUlQ4N2JiOFJOSHZjYk1Jd0VXUzBZdnIxek0wVFg3RGx5T3VoZVBQ?= =?iso-2022-jp?B?WGNReFFHUXZSc2JPVkRJUkk1ZDhQOTFuRjJWMUFuVXN3WHdvVlBrSXc3?= =?iso-2022-jp?B?NWNqVDJrM0piRnJ4Z0xmMGFiR3dhc0xyTlErTUo0cEpTM1h2QTdqYTFY?= =?iso-2022-jp?B?M2taZlVCZExyTksxMVF3SCsyV2UweXUrdVdWMTdrbk1ZMTZ5MVk4d2JV?= =?iso-2022-jp?B?RFdSV1BkZ1NtNDRNeEpObkNsVkpYY0dhd2pNQmNMejBqWVBCSVgrYzdI?= =?iso-2022-jp?B?NjY3cEVMeTlmOGMvbUdRdk5ZSFdldUVoUkEycFEwYlg2TklTVjQveERG?= =?iso-2022-jp?B?S2Y5cWRMd09MWDVIVmhralJ4VmNrTnBrd2c3ZmRQaUZISi9YQXRERkNS?= =?iso-2022-jp?B?SytsUko4bFc3T0dVeFlDUmQ5c0E0cDcyVGRzUEcyTEM1YWMxemltS3k3?= =?iso-2022-jp?B?Rm1uL0k0R0tXK2ppKzVUN3JoV0o0L1Bsa3ozU0Q4MWFqNTJzWGhpQXZU?= =?iso-2022-jp?B?bDI2R3pxWHp6OVZKZ2JxUzJpVEQwQ3BpZTlzTHpHNlJUM3htajNLZzNl?= =?iso-2022-jp?B?ZVRKbHdmbVU5YnlWcStmVCtkK0VsOU44aHgwcFYrVGNpM1MySm94MHpR?= =?iso-2022-jp?B?QVJ0NXdVZkpwRWNHTkR1RmQ2YWpoTnArRUJmNGZEMWxYS0R3NktnWTZL?= =?iso-2022-jp?B?eER0aTFHbEV2Vnp1OUQ3eS9MYk5ja0lTYVFCOHZXMGRVWjBQME9FWXpp?= =?iso-2022-jp?B?ZWtHSC93OVFqVk9KeWhDbFQ1blN1cG9tMS95NjBFL2QrOW1xRGJhK1JK?= =?iso-2022-jp?B?ZTM3eEdIMHhxT1V1VGliY290K0xpVU5LU2pNakJoTXh3QTZqNE1kY05r?= =?iso-2022-jp?B?NytrZkxlMTM5VmxlM2xNR3llOE1aU0U2aWE5WEVFQ0ZNWXhHSFBaajFC?= =?iso-2022-jp?B?Z0Q2SU9tRTI4dFk5dCtSQVRRbE9qSlpCY1BodXVURVZQeGNhb085aUx2?= =?iso-2022-jp?B?d3YwTzJvampENHNkSG5IcWxvbXVhYjlRWjZ6dFU0MkJrUWpDd3RSSGJa?= =?iso-2022-jp?B?ai92SE5zTmZ1MGhmNkpsZlRMK09yUTZEckxCWldlUEJaVUlFMFNMVy8y?= =?iso-2022-jp?B?Smt4bWNQS2gyVUNSempBV05kTDdSei9sZ0c1cHNFV3JMQThNWGV6YmZL?= =?iso-2022-jp?B?ZDVZUUxGc2NRd2ZhZy9ydlZ0SllvRS9ScDBHcVBzLzY3c0RTekU0TTFM?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 58df33ae-ab39-42fe-bf91-08d90008ecca X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Apr 2021 12:20:58.1357 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: XdXp9MZo5wlsr4Zhd5khF3WAAYBSrCXEIosYrt9u5r3W41E7a/YFlUjhx0IUah/NSPGSyJ3D0Sl38WLOCeQj2Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB3629 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, KAM_STOCKGEN, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2021 12:21:05 -0000 Hi Wilco-san, Thanks for reviewing in detail technically!! Now we have several topics to discuss. So let me focus on the BTI in this mail. I'll answer other topics in later = mail. > From: Wilco Dijkstra >=20 > Thanks for the comprehensive reply, especially the graphs are quite usefu= l! > (I'd avoid adding generic_memcpy/memmove though since those are unoptimiz= ed > C implementations). OK, I'll withdraw the patch from the A64FX patch V2. > > For small/medium copies, I needed to remove BTI macro from ASM ENTRY > > in order to see the distinct performance difference between ASIMD and S= VE. > > I'll post the patch [14] with the A64FX second patch. >=20 > I'm not sure I understand - the BTI macro just emits a NOP hint so it is = harmless. > We always emit it so that it works seamlessly when BTI is enabled. Yes, I observed that just " hint #0x22" is inserted. The benchtest results show that the A64FX performance of size less than 100= B with BTI is slower than ASIMD, but without BTI is faster than ASIMD. And the A64FX performance of 512B with BTI 4Gbps/sec slower than without BT= I. With BTI, source code [4]=20 [1] https://drive.google.com/file/d/1LlyQOq7qT4d0-54uVzUtYMMMDgIiddEj/view [2] https://drive.google.com/file/d/1C2pl-Iz_-18mkpuQTk1PhEHKsd5x0wWo/view [3] https://drive.google.com/file/d/1eg_p1_b619KN7XLmOpxqcoI3c9o4WXd-/view [4] https://github.com/NaohiroTamura/glibc/commit/0f45fff654d7a31b58e5d6f4d= bfa31d6586f8cc2 Without BTI, source code [8] [5] https://drive.google.com/file/d/1Mf7wxwgGb5yYBJo1eUxqvjrkp9O4EVVJ/view [6] https://drive.google.com/file/d/1rgfFmWsM4Q3oDK8aYa_GjEQWttS0pOBF/view [7] https://drive.google.com/file/d/1hF7oevP-MERrQ04yajtEUY8CSWe8V2EX/view [8] https://github.com/NaohiroTamura/glibc/commit/c204a74971b3d34680964bc52= ac59264b14527e3 I executed the same test on ThanderX2, the result had very little differenc= e between with BTI and without BTI as you mentioned. So if distinct degradation happens only on A64FX, I'd like to add another ENTRY macro in sysdeps/aarch64/sysdep.h such as: #define ENTRY_ALIGN_NO_BTI(name, align) \ .globl C_SYMBOL_NAME(name); \ .type C_SYMBOL_NAME(name),%function; \ .p2align align; \ C_LABEL(name) \ cfi_startproc; \ CALL_MCOUNT Or I'd like to change memcpy_a64fx.S and memset_a64fx.S without ENTRY macro= such as: .globl __memcpy_a64fx .type __memcpy_a64fx, %function .p2align 6 __memcpy_a64fx: cfi_startproc CALL_MCOUNT What do you think? > > And also somehow on A64FX as well as on ThunderX2 machine, > > memcpy-random didn't start due to mprotect error. >=20 > Yes it looks like the size isn't rounded up to a pagesize. It really need= s the extra > space, so changing +4096 into getpagesize () will work. OK, I've already applied it [8]. Thanks! Naohiro