From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 47231 invoked by alias); 29 Oct 2019 14:34:20 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 47222 invoked by uid 89); 29 Oct 2019 14:34:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:8DC571D X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8q5yXpkPhsowZNuC7u0Ot8+POjPvqqpR5QJ1qNt6OU8=; b=c2qsqyMbt6uUWfCN5osnbm2PPgVVaEup3bQZ43BQi12MAFFV1lj6H+tPPS6ePKPJN/fa8wlh4Vnt7e4/WvlnPfWSBhkBywxa1iFBAQC00ixyRnZFwXAP347PK2/SRhepU4l2N+WHD6Ra830BErvkSiDpOwkKd1u2NKZNpfvvnck= Authentication-Results: spf=fail (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=none action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; X-CheckRecipientChecked: true X-CR-MTA-CID: 4d7c84f4473e2146 X-CR-MTA-TID: 64aa7808 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OJp8doLeviS3tnrD5+TMYAVi+7qaZ1r4K9lAGk2v874C4EIBHI4+zPhKyRHyr1HPgaF4he9rPg3bjAAJnjec3IjJfI6eOcW3rh6aWiijCILTGdxddgbgLi+Kk1HAwB7dLcbo9DyTk2Nq+rdjouKQYwo2srZ6yvZ+2dBeGSvHuxpCA7jg0dvvWSYiac6x68D2gbWJJiBr83/uSqU4QVPkAK43m6PEQRiycmyykp7jSf6aDtXP1pqrIlSmYVFRxYH+PMz6KRexoxGqsLCfFQx9tDA7qQ+Gr3sJAhZvyOybN+ZHtrAfmSJEyYAFor+PI3okizlvc8siZKfh+5THrUhvnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8q5yXpkPhsowZNuC7u0Ot8+POjPvqqpR5QJ1qNt6OU8=; b=WvTAhGzakuTZgZLMxlu7o7M3b5qICCnJe6CLlfJBv4+xVyezOhsZ2UBIsklZJuUlpbdMv+a6vRnhqvC+vi9DW9VWfzBAAZyVh/RHCkJeP4olJGqPfVPxcpc9iMUdmeXukNYmmNuY9ZmcAj5XYok8nHyH/8YLM3H21tv/Rln3OoCCtspjtcclyJfg+tbJHgR8L6kZ3vD9Yd9iiMxuh/6ZY29ZvvjGWNd5RE53I50pFOiu/BJiepRwiTajDzxOJViVs6GHto6NysAX+9k5UpIeqb4wQDiW6DatdIRKyMjLNhNt/yJgtSh+r0hWnbAgXl/xgzDC+hWR2ADq3eQydkc2KQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8q5yXpkPhsowZNuC7u0Ot8+POjPvqqpR5QJ1qNt6OU8=; b=c2qsqyMbt6uUWfCN5osnbm2PPgVVaEup3bQZ43BQi12MAFFV1lj6H+tPPS6ePKPJN/fa8wlh4Vnt7e4/WvlnPfWSBhkBywxa1iFBAQC00ixyRnZFwXAP347PK2/SRhepU4l2N+WHD6Ra830BErvkSiDpOwkKd1u2NKZNpfvvnck= From: Wilco Dijkstra To: "Zhangxuelei (Derek)" , Yikun Jiang CC: "libc-alpha@sourceware.org" , nd , Siddhesh Poyarekar , jiangyikun , Szabolcs Nagy Subject: Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor Date: Tue, 29 Oct 2019 14:34:00 -0000 Message-ID: References: <8DC571DDDE171B4094D3D33E9685917BD87078@DGGEMI529-MBX.china.huawei.com> In-Reply-To: <8DC571DDDE171B4094D3D33E9685917BD87078@DGGEMI529-MBX.china.huawei.com> Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-exchange-transport-forked: True x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;SFS:(10009020)(4636009)(366004)(396003)(376002)(39860400002)(136003)(346002)(199004)(189003)(9686003)(186003)(74316002)(66946007)(26005)(76116006)(486006)(66476007)(66446008)(64756008)(66556008)(7736002)(11346002)(446003)(305945005)(476003)(66066001)(33656002)(6436002)(229853002)(4326008)(55016002)(14454004)(54906003)(110136005)(86362001)(25786009)(316002)(8676002)(99286004)(6506007)(76176011)(8936002)(3846002)(81156014)(81166006)(478600001)(71200400001)(71190400001)(52536014)(2906002)(6116002)(102836004)(14444005)(5660300002)(256004)(6246003)(7696005);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0801MB1760;H:VI1PR0801MB2127.eurprd08.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Y5dOYApgQ3eReevix80eW3NDD65/LgUMZxX3exZBYlsOs2EDt0PRV4JxPsWIU2BzS0xRhohwdJaqKoouGLzYq/v+NZFCurY3QIADzPYbealiPXYjdEh5QT9M7FeJ1NMfIj6AO895WZNSip79xR7JMQY0jeWRqU5Ow/qItIuxW+f3QROADWoJ/XFynCXf23o9QWP7jJmHeho9nkufIrJNwH/s//YPUeHKKBodcVEnorcZNvD8ulYoVbCgR0o5sVNJMBa3yMSgTnLlET0BBXZ1noVekv7FrQ7uoCUNGM0LsTwiZntWaxbn19oqkWflVVx9D3LOBcDjTAQgeEYWrH+mPLnihXqH7utZo3ny5NmtznoT+1lZEQ+ps/Ebv7QRzhQVqxBloQPV0sZe4yzy6VZki7xwAIjtX3kqt7xRZbJSGyhQDb/1HjoQeYJ7UFWSAx/5 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; Return-Path: Wilco.Dijkstra@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT020.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: f0d49b88-5a62-463f-e817-08d75c7d0d1a X-SW-Source: 2019-10/txt/msg00890.txt.bz2 Hi Derek, >> Note that memchr_strlen significantly outperforms the fastest strlen >> on sizes larger than 256, so I don't think that using uminv to test >> for zeroes is the fastest approach. > > Indeedly, but memchr_strlen really has poor performance before 256 bytes, Well that means memchr can be sped up for small sizes. While it is more complex than strlen, it shouldn't be significantly slower. > and if we mix this method into current version, we may need a length count > and judge it more than 256 bytes or not in each loop, is this way cheap? That may be possible, eg. by unrolling the first 64-128 bytes and using a l= oop optimized for throughput for anything larger (on the assumption that if a string is larger than 128, it is likely much larger). However my point was that while the uminv sequence is simple and small, it'= s not the fastest, so ultimately we need to find an alternative sequence which wo= rks better for all the generic string functions which search for a character (s= trlen, strnlen, memchr, memrchr, rawmemchr, strchr, strnchr, strchrnul, strcpy, strncpy). > And we think small size is more important for strlen. Absolutely, handling small cases quickly is essential for all string functi= ons. Wilco