From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28835 invoked by alias); 21 Oct 2019 14:25:54 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 28827 invoked by uid 89); 21 Oct 2019 14:25:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=H*Ad:U*siddhesh, fastest, our X-HELO: huawei.com From: "Zhangxuelei (Derek)" To: Wilco Dijkstra , Yikun Jiang CC: "libc-alpha@sourceware.org" , nd , Siddhesh Poyarekar , jiangyikun , Szabolcs Nagy Subject: Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor Date: Mon, 21 Oct 2019 14:25:00 -0000 Message-ID: <8DC571DDDE171B4094D3D33E9685917BD87078@DGGEMI529-MBX.china.huawei.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SW-Source: 2019-10/txt/msg00620.txt.bz2 Hi Wilco, thaks for your rely and suggestion. > So this makes it highly desirable to improve the generic versions > of string functions. We completely agree, we also like to contribute our changes in to generic v= ersion, because the most of our changes is based on generic version. And we had some misunderstanding, we thought the ifunc is the general imple= nments in glibc. :) However, there are two type patches: 1. The improvement based on generic version. There is no doubt that, we sho= uld contribute it into generic version. 2. Kunpeng specific implement, just like the memcpy patch, it is used to so= lve the specific of Kunpeng CPU, so we hope we can add it in ifunc to enbal= e this kind of patch. In addition, is there any other work to cover if we contribute as generic v= ersion? > Note that memchr_strlen significantly outperforms the fastest strlen > on sizes larger than 256, so I don't think that using uminv to test > for zeroes is the fastest approach. Indeedly, but memchr_strlen really has poor performance before 256 bytes, a= nd if we mix this method into current version, we may need a length count a= nd judge it more than 256 bytes or not in each loop, is this way cheap? And= we think small size is more important for strlen. Finally, we will submit other generic implenments as soon as possible, and = it would be good if you could review this two patches firstly:) [1]. memrchr: it's already submited as generic version. see link: https://sourceware.org/ml/libc-alpha/2019-10/msg00526.html [2]. memcpy/memmove: it's the specific kunpeng=20 https://sourceware.org/ml/libc-alpha/2019-10/msg00522.html