From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 62677 invoked by alias); 6 Sep 2019 16:34:24 -0000 Mailing-List: contact libc-stable-help@sourceware.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Subscribe: List-Archive: Sender: libc-stable-owner@sourceware.org Received: (qmail 62667 invoked by uid 89); 6 Sep 2019 16:34:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-19.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy= X-Spam-Status: No, score=-19.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr130078.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (40.107.13.78) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Sep 2019 16:34:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FNeEJDtbrvhjarBs2lDqzmu+g3OxsBzVZwTK9YeiaCQ=; b=Urt18HTM4mDuXDeTWsvMApHaY6bdlEn5aa0P9BYczBSsDrmw7MZbkM0vvPGyP9SiTGiib69/94FUHq7rU/o6nWnX2fr+rW3Db2h+nt0BoYSNOe8Yg1S/lBNGEXg9nuKhxT2M7KuCppDH8S3j7BgH7fHGyTeLr9N+MnPtZBUKyjk= Received: from VI1PR08CA0227.eurprd08.prod.outlook.com (2603:10a6:802:15::36) by AM5SPR00MB240.eurprd08.prod.outlook.com (2603:10a6:203:14::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2241.15; Fri, 6 Sep 2019 16:34:16 +0000 Received: from DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e0a::205) by VI1PR08CA0227.outlook.office365.com (2603:10a6:802:15::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2241.14 via Frontend Transport; Fri, 6 Sep 2019 16:34:15 +0000 Authentication-Results: spf=temperror (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=temperror action=none header.from=arm.com; Received-SPF: TempError (protection.outlook.com: error in processing during lookup of arm.com: DNS Timeout) Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT019.mail.protection.outlook.com (10.152.20.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2241.14 via Frontend Transport; Fri, 6 Sep 2019 16:34:14 +0000 Received: ("Tessian outbound a25c4e5fef41:v27"); Fri, 06 Sep 2019 16:34:14 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 9244532f09bad53d X-CR-MTA-TID: 64aa7808 Received: from a7f100c251b9.1 (cr-mta-lb-1.cr-mta-net [104.47.12.55]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1B9D3914-0E73-4EDD-9975-275685DFFE2C.1; Fri, 06 Sep 2019 16:34:09 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04lp2055.outbound.protection.outlook.com [104.47.12.55]) by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a7f100c251b9.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 06 Sep 2019 16:34:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=T4upCKeOcg6CzauNwTFpQGnWDqSC1siNeN41XLTL22AkgTNxBOHqV2MQ2T56qcZOGFC2pFqkQ/DK5xhxjLMZcttcNEO/DNHt5+ax4RLL+2/rhr3t03ns08hVMXpBAy7VVCRTqqLmCYYfAfNnhr+C1/XXwu8EWpzbl+45XkHOGHXPQjnhkl+Rmlu6aS7LG0B5i7tE/iszgeSwrYpTXxeGGkM1EQ0q/d0dbj0eEa9/Mtmjx249qPvjUgkOGW4LijMMqIFES/7Hg8z+0hWkNN61GnOYpyE66+Y41HWkGm/7Ee7TCuu+QcTT8sQtbSMuCIQTAMT2/noDZ3R4YYWWruHA1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FNeEJDtbrvhjarBs2lDqzmu+g3OxsBzVZwTK9YeiaCQ=; b=As4/W8cIb+lw/OcWzI5UjbcqgMdT/2V1nixz2QA2oex68kThL5IchqT6sHak4eMtTu+g85P5n7ZwHZ8pswkj9TDGhfHf9CdMTYEMSzDPMaojT/Jn9c863mNrw/qf63STtmBbYh/hQeqok6NcMy2iItHF3OS8IUNB9kRXJCB4Dw4RFEYTwnU2gUYBsLFzbZuuFAIeCfzaDymAtxRzc6skinOLD6xg+I8FT5+CgSVvBVhJW9bTU3k32vzUc7HA1f0T6t5sUVpcD3o2rl6zk9GbyxKJd8vmyxTlKoRB3xWnydSK0hyplwdmRBh1Np6XtjHUYpzVxhUAHWyIKRutKi8IYA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FNeEJDtbrvhjarBs2lDqzmu+g3OxsBzVZwTK9YeiaCQ=; b=Urt18HTM4mDuXDeTWsvMApHaY6bdlEn5aa0P9BYczBSsDrmw7MZbkM0vvPGyP9SiTGiib69/94FUHq7rU/o6nWnX2fr+rW3Db2h+nt0BoYSNOe8Yg1S/lBNGEXg9nuKhxT2M7KuCppDH8S3j7BgH7fHGyTeLr9N+MnPtZBUKyjk= Received: from VI1PR0801MB2127.eurprd08.prod.outlook.com (10.168.62.22) by VI1PR0801MB2093.eurprd08.prod.outlook.com (10.173.74.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2220.19; Fri, 6 Sep 2019 16:34:07 +0000 Received: from VI1PR0801MB2127.eurprd08.prod.outlook.com ([fe80::7c75:98da:fbc1:da02]) by VI1PR0801MB2127.eurprd08.prod.outlook.com ([fe80::7c75:98da:fbc1:da02%11]) with mapi id 15.20.2241.018; Fri, 6 Sep 2019 16:34:07 +0000 From: Wilco Dijkstra To: "libc-stable@sourceware.org" CC: nd Subject: [2.27 COMMITTED][AArch64] Backport memcmp improvements Thread-Topic: [2.27 COMMITTED][AArch64] Backport memcmp improvements Thread-Index: AQHVZNDnBsacuXx/3E+OT3I1kZJIjg== Date: Tue, 01 Jan 2019 00:00:00 -0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-originating-ip: [217.140.106.54] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: b2ce7bb8-8056-4776-4768-08d732e80e0a X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam-Untrusted: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600166)(711020)(4605104)(1401327)(4618075)(2017052603328)(7193020);SRVR:VI1PR0801MB2093; X-MS-TrafficTypeDiagnostic: VI1PR0801MB2093:|AM5SPR00MB240: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:3631;OLM:3631; x-forefront-prvs: 0152EBA40F X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;SFS:(10009020)(4636009)(396003)(39860400002)(376002)(136003)(346002)(366004)(54534003)(199004)(189003)(6436002)(305945005)(66556008)(64756008)(14454004)(52536014)(66446008)(7736002)(5640700003)(66476007)(4326008)(486006)(316002)(476003)(66066001)(6116002)(3846002)(5660300002)(99286004)(86362001)(2501003)(186003)(2906002)(26005)(102836004)(33656002)(25786009)(2351001)(55016002)(6506007)(66946007)(76116006)(9686003)(14444005)(74316002)(71190400001)(8676002)(71200400001)(53936002)(81166006)(6916009)(256004)(7696005)(478600001)(8936002)(81156014)(357404004);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0801MB2093;H:VI1PR0801MB2127.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Message-Info-Original: tuSXYm4DKackLLfF7Hhv7heLY8Xjini6yfzriHMpHhp5PnkSBJEux9Opm2Y9iDpEIaLcf9hNEm1P00MjQO2p0w+G1L1ZvQwFK+TSA522UTakze7FkFRzuXNIe2EPk2pLNmWfJsgzTLApHiBb7YEd0s7cY4S6385YZluJDC2NLYkUrqrtYnwqfHYAATxrfWtSrzao/oXKmDfKpnvKrIo4sjBAHvceq90FKPBinJhYDT++rW7VUehrtD+40mTuAUZ+iULrV4eWV4gtqa2gwIuAetLwTPNsPAMv59uH+QticqaCoW0sx5579DqBCG2cfVEZJJWeUjEKbYiEwFMKaL4u5IxW/VPuw4qTjZEA6fzXiiLNyC0qyc2wpjyrEce2qWKy0LCKE6P/+bUOdo3+WDIzjW3DMF3I0ROJgchSfHPPlTM= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB2093 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123;IPV:CAL;SCL:-1;CTRY:IE;EFV:NLI;SFV:NSPM;SFS:(10009020)(4636009)(136003)(39860400002)(376002)(396003)(346002)(2980300002)(199004)(189003)(54534003)(2906002)(186003)(33656002)(2351001)(8936002)(70206006)(99286004)(50466002)(23756003)(70586007)(14454004)(6916009)(9686003)(55016002)(5640700003)(316002)(22756006)(8746002)(5660300002)(356004)(2501003)(486006)(6116002)(3846002)(26826003)(478600001)(52536014)(14444005)(76130400001)(47776003)(63350400001)(63370400001)(86362001)(74316002)(336012)(26005)(102836004)(6506007)(8676002)(305945005)(4326008)(25786009)(81166006)(7736002)(81156014)(476003)(126002)(66066001)(7696005)(357404004);DIR:OUT;SFP:1101;SCL:1;SRVR:AM5SPR00MB240;H:64aa7808-outbound-1.mta.getcheckrecipient.com;FPR:;SPF:TempError;LANG:en;PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com;A:1;MX:1; X-MS-Office365-Filtering-Correlation-Id-Prvs: b517e855-3f44-493c-897e-08d732e809f1 X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(5600166)(710020)(711020)(4605104)(1401327)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7193020);SRVR:AM5SPR00MB240; NoDisclaimer: True X-Forefront-PRVS: 0152EBA40F X-Microsoft-Antispam-Message-Info: F6D6yvR4oAx1/5hGFz7yiD6ULzrkAb57pWpslcMwbn8oBSHIEMH4cQKyv9ap9UuLAxCnsuNGVIiLY70o60WhkCg4PkAGg2Va9Rl3Z30QpR1QvG3LLU/qn+ZlfiVXdUJju8HCbl5cds8KF2dLFhbGFb/JWE6E7KqwMuhqw9kj/2H+eVxY8iMrRcD5LB5TI36nquEtjNNKj4Tp5oq/fXPbKgR31cLppzWBkPXzwY/BL1gWqq6F6cMENNv3BXSXjc4IKrZqfac55DyieTYvLUOVnguE3VJd0xXk/JpjOAORhQPyp0jFq7zsRi6qjCkMmpbGqkmYV+ci4y3PF4nUgpzclvh7pVUgbSbQMk/kCvXQAsVIsml+miHPL9g6k7OIDpb46vV8ONtVbtJ8ScHpgs6s49S4o1yz4MySDJvDzFNVYnQ= X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Sep 2019 16:34:14.4860 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b2ce7bb8-8056-4776-4768-08d732e80e0a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;Ip=[63.35.35.123];Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5SPR00MB240 X-SW-Source: 2019-09/txt/msg00004.txt.bz2 commit 062139f233a9ef94a86b91850b942d5fa991ecbe Author: Siddhesh Poyarekar Date: Tue Mar 6 19:22:39 2018 +0530 aarch64: Optimized memcmp for medium to large sizes =20=20=20=20 This improved memcmp provides a fast path for compares up to 16 bytes and then compares 16 bytes at a time, thus optimizing loads from both sources. The glibc memcmp microbenchmark retains performance (with an error of ~1ns) for smaller compare sizes and reduces up to 31% of execution time for compares up to 4K on the APM Mustang. On Qualcomm Falkor this improves to almost 48%, i.e. it is almost 2x improvement for sizes of 2K and above. =20=20=20=20 * sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a time. =20=20=20=20 (cherry picked from commit 30a81dae5b752f8aa5f96e7f7c341ec57cba3585) commit f3e2add2130797967287ee55eecacd570e456d2a Author: Siddhesh Poyarekar Date: Fri Feb 2 10:15:20 2018 +0530 aarch64: Use the L() macro for labels in memcmp =20=20=20=20 The L() macro makes the assembly a bit more readable. =20=20=20=20 * sysdeps/aarch64/memcmp.S: Use L() macro for labels. =20=20=20=20 (cherry picked from commit 84c94d2fd90d84ae7e67657ee8e22c2d1b796f63) diff --git a/ChangeLog b/ChangeLog index cb36feb..0374576 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,10 @@ 2019-09-06 Siddhesh Poyarekar =20 + * sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a + time. + +2019-09-06 Siddhesh Poyarekar + * sysdeps/aarch64/memcmp.S: Use L() macro for labels. =20 2019-07-15 Adhemerval Zanella diff --git a/sysdeps/aarch64/memcmp.S b/sysdeps/aarch64/memcmp.S index ecd1206..8325d04 100644 --- a/sysdeps/aarch64/memcmp.S +++ b/sysdeps/aarch64/memcmp.S @@ -34,9 +34,12 @@ /* Internal variables. */ #define data1 x3 #define data1w w3 -#define data2 x4 -#define data2w w4 -#define tmp1 x5 +#define data1h x4 +#define data2 x5 +#define data2w w5 +#define data2h x6 +#define tmp1 x7 +#define tmp2 x8 =20 ENTRY_ALIGN (memcmp, 6) DELOUSE (0) @@ -46,39 +49,70 @@ ENTRY_ALIGN (memcmp, 6) subs limit, limit, 8 b.lo L(less8) =20 - /* Limit >=3D 8, so check first 8 bytes using unaligned loads. */ ldr data1, [src1], 8 ldr data2, [src2], 8 - and tmp1, src1, 7 - add limit, limit, tmp1 + cmp data1, data2 + b.ne L(return) + + subs limit, limit, 8 + b.gt L(more16) + + ldr data1, [src1, limit] + ldr data2, [src2, limit] + b L(return) + +L(more16): + ldr data1, [src1], 8 + ldr data2, [src2], 8 cmp data1, data2 bne L(return) =20 + /* Jump directly to comparing the last 16 bytes for 32 byte (or les= s) + strings. */ + subs limit, limit, 16 + b.ls L(last_bytes) + + /* We overlap loads between 0-32 bytes at either side of SRC1 when = we + try to align, so limit it only to strings larger than 128 bytes.= */ + cmp limit, 96 + b.ls L(loop8) + /* Align src1 and adjust src2 with bytes not yet done. */ + and tmp1, src1, 15 + add limit, limit, tmp1 sub src1, src1, tmp1 sub src2, src2, tmp1 =20 - subs limit, limit, 8 - b.ls L(last_bytes) - - /* Loop performing 8 bytes per iteration using aligned src1. - Limit is pre-decremented by 8 and must be larger than zero. - Exit if <=3D 8 bytes left to do or if the data is not equal. */ + /* Loop performing 16 bytes per iteration using aligned src1. + Limit is pre-decremented by 16 and must be larger than zero. + Exit if <=3D 16 bytes left to do or if the data is not equal. */ .p2align 4 -L(loop8): - ldr data1, [src1], 8 - ldr data2, [src2], 8 - subs limit, limit, 8 - ccmp data1, data2, 0, hi /* NZCV =3D 0b0000. */ - b.eq L(loop8) +L(loop16): + ldp data1, data1h, [src1], 16 + ldp data2, data2h, [src2], 16 + subs limit, limit, 16 + ccmp data1, data2, 0, hi + ccmp data1h, data2h, 0, eq + b.eq L(loop16) =20 cmp data1, data2 bne L(return) + mov data1, data1h + mov data2, data2h + cmp data1, data2 + bne L(return) =20 - /* Compare last 1-8 bytes using unaligned access. */ + /* Compare last 1-16 bytes using unaligned access. */ L(last_bytes): - ldr data1, [src1, limit] - ldr data2, [src2, limit] + add src1, src1, limit + add src2, src2, limit + ldp data1, data1h, [src1] + ldp data2, data2h, [src2] + cmp data1, data2 + bne L(return) + mov data1, data1h + mov data2, data2h + cmp data1, data2 =20 /* Compare data bytes and set return value to 0, -1 or 1. */ L(return):