From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adhemerval.zanella@linaro.org>
Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a])
	by sourceware.org (Postfix) with ESMTPS id 166823858D28
	for <libc-alpha@sourceware.org>; Mon, 19 Sep 2022 20:16:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 166823858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-1279948d93dso1037849fac.10
        for <libc-alpha@sourceware.org>; Mon, 19 Sep 2022 13:16:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:from:to:cc:subject:date;
        bh=44X0M0OVoth62ali7OJrgtGbTuo60FOWLvwVip9xPyQ=;
        b=At8TQsH2s9XARIfc6d7iZc95QsxSUiYBEa3Dc0XtbSIR9UDGgs8wOPxRd+OS3JO8Xh
         QnMDBKMogjeX2l7NX3n3nsErmJVapkE2oavbMWEGcXgBEMND8WbomZXPdmT26hvD37YE
         zMzTuvHGfamVQVicoxkxFi/8UdD/GA8LseXEqf9IdnFY3kKxpLykKaNrHs7DPBhDVJGe
         mfydwP29xGFIGtJxHFqpHDZFT2oz8TtcPu+RTO5byPg2ciwLb0Yr+Kp8+eVHaH81HCPr
         HgC12IxdaPjFWOqR04tTivWjamiaygMGUyLrBwF2hICK7c8HWFMbUkuqTyxGnLXJRb9R
         jvOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date;
        bh=44X0M0OVoth62ali7OJrgtGbTuo60FOWLvwVip9xPyQ=;
        b=NQR4zCfNKl3XK5Qkrfwbbb0LDKtI+DotY4/IiGaI8o4/4x4kvt8QQH0y7N5ka5W0/E
         cJjunxIY2M7TnchpKUyOIsKSsV/HVwYah2oEt9hp9LSwCmNCITlRcX0XZYi17VOgEDqj
         S9kElK0JDi5BsHCwd1W4ee+IswlEUUadlJMYe6jVwKl0LUgAJ87CWJ0BJS9/OgAFYrDg
         8cx7JxygErrYLPGFkWLB1Z+LfC75AVVu/nyf2qKLNYKoWWde3fOTZyFIM3D85Q3k65Pt
         /sbWwrvK8nmJ6cUba4dKG+FcRHNr2FqsALwm6yC18D3P1d3/rzrAbKy3PqXGpYPYsaF4
         T2CQ==
X-Gm-Message-State: ACgBeo1gASAdc1yCP8UwtDJIH7TtJr09HD5nuJaYjvQ0mwHg7WL8jqTI
	lYeqq3jXB54SGTda2Dxd1pLOMZsgcfmkj1Gl
X-Google-Smtp-Source: AA6agR7mn7drF1EX2q4r/AkoijuG3HHN/cteIHySlcLvuIMP2LmzdJ2/oScnnsK0c73eQDrZHFoxqg==
X-Received: by 2002:a05:6870:585:b0:11c:43a:5bb5 with SMTP id m5-20020a056870058500b0011c043a5bb5mr16256938oap.89.1663618579427;
        Mon, 19 Sep 2022 13:16:19 -0700 (PDT)
Received: from ?IPV6:2804:1b3:a7c1:c266:6474:c804:752d:521c? ([2804:1b3:a7c1:c266:6474:c804:752d:521c])
        by smtp.gmail.com with ESMTPSA id w201-20020aca30d2000000b00342fc99c5cbsm13258501oiw.54.2022.09.19.13.16.17
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 19 Sep 2022 13:16:19 -0700 (PDT)
Message-ID: <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org>
Date: Mon, 19 Sep 2022 17:16:16 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.2.2
Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions.
Content-Language: en-US
To: "dengjianbo@loongson.cn" <dengjianbo@loongson.cn>
Cc: joseph <joseph@codesourcery.com>, carlos <carlos@redhat.com>,
 libc-alpha <libc-alpha@sourceware.org>, "i.swmail" <i.swmail@xen0n.name>,
 xuchenghua <xuchenghua@loongson.cn>, caiyinyu <caiyinyu@loongson.cn>
References: <fe6171f8-9fde-b84a-31c8-70bb026252a4@linaro.org>
 <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn>
 <2022091910031722091613@loongson.cn>
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Organization: Linaro
In-Reply-To: <2022091910031722091613@loongson.cn>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


On 18/09/22 23:03, dengjianbo@loongson.cn wrote:
> Hi Adhemerval,
> 
> Please kindly see the following link for the test results of comparing with 
> new generic version. 
> https://sourceware.org/pipermail/libc-alpha/2022-September/142016.html
> 
> Comparing with the previous patch, we further optimized strchr and
> strchrnul, 4 instructions was reduced before the loop.

Do you have any breakdown if either loop unrolling or missing string-fzi.h/
string-fza.h is what is making difference in string routines? 

Checking on last iteration [1], it seems that strchr is issuing 2 loads
on each loop iteration and using bit-manipulation instruction that I am
not sure compiler could emit with generic code. Maybe we can tune the
generic implementation to get similar performance, as Richard has done
for alpha, hppa, sh, and powerpc?

I am asking because from the brief description of the algorithm, the
general idea is essentially what my generic code aims to do (mask-off
initial bytes, use word-aligned load and vectorized compares, extract
final bytes), and I am hoping that architecture would provide 
string-fz{i,a}.h to get better code generation instead of pushing
for more and more hand-write assembly routines.

[1] https://patchwork.sourceware.org/project/glibc/patch/20220916071642.2822131-2-dengjianbo@loongson.cn/

> 
> Best regards,
> Deng jianbo
> From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Date: Fri, 2 Sep 2022 09:27:33 -0300
> To: Joseph Myers <joseph@codesourcery.com>, Carlos O'Donell <carlos@redhat.com>
> CC:caiyinyu <caiyinyu@loongson.cn>, libc-alpha@sourceware.org, i.swmail@xen0n.name, xuchenghua@loongson.cn
> Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions.
> 
> On 15/08/22 17:46, Joseph Myers wrote:
> On Mon, 15 Aug 2022, Carlos O'Donell via Libc-alpha wrote:
> 
> On 8/15/22 04:57, caiyinyu wrote:
> Tested on LoongArch machine: gcc 13.0.0, Linux kernel 5.19.0 rc2,
> binutils branch master 2eb132bdfb9.
> 
> Could you please post microbenchmark results for these changes?
> 
> How much faster are they from the generic versions?
> 
> Note that so far we haven't merged the improved generic string functions that were posted a while back (https://sourceware.org/legacy-ml/libc-alpha/2018-01/msg00318.html is the version linked from https://sourceware.org/glibc/wiki/NewPorts - don't know if it's the most recent version). So even if assembly versions are better than the current generic string functions, they might not be better than improved generic versions with architecture-specific implementations of the headers to provide per-architecture tuning.
> 
> 
> And it seems that some of this newer implementations does what my patch
> basically does. The memmove is an improvement since the generic code we
> have does a internal libcall to memcpy (which some architecture optimizes
> it by implementing memcpy and memmove on some TU to just do a branch instead of a function call).
> 
> I will rebase and resend my improved generic string, I think it would
> yield very similar numbers to the str* assembly implementations proposed.