From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a]) by sourceware.org (Postfix) with ESMTPS id 166823858D28 for ; Mon, 19 Sep 2022 20:16:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 166823858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-1279948d93dso1037849fac.10 for ; Mon, 19 Sep 2022 13:16:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date; bh=44X0M0OVoth62ali7OJrgtGbTuo60FOWLvwVip9xPyQ=; b=At8TQsH2s9XARIfc6d7iZc95QsxSUiYBEa3Dc0XtbSIR9UDGgs8wOPxRd+OS3JO8Xh QnMDBKMogjeX2l7NX3n3nsErmJVapkE2oavbMWEGcXgBEMND8WbomZXPdmT26hvD37YE zMzTuvHGfamVQVicoxkxFi/8UdD/GA8LseXEqf9IdnFY3kKxpLykKaNrHs7DPBhDVJGe mfydwP29xGFIGtJxHFqpHDZFT2oz8TtcPu+RTO5byPg2ciwLb0Yr+Kp8+eVHaH81HCPr HgC12IxdaPjFWOqR04tTivWjamiaygMGUyLrBwF2hICK7c8HWFMbUkuqTyxGnLXJRb9R jvOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=44X0M0OVoth62ali7OJrgtGbTuo60FOWLvwVip9xPyQ=; b=NQR4zCfNKl3XK5Qkrfwbbb0LDKtI+DotY4/IiGaI8o4/4x4kvt8QQH0y7N5ka5W0/E cJjunxIY2M7TnchpKUyOIsKSsV/HVwYah2oEt9hp9LSwCmNCITlRcX0XZYi17VOgEDqj S9kElK0JDi5BsHCwd1W4ee+IswlEUUadlJMYe6jVwKl0LUgAJ87CWJ0BJS9/OgAFYrDg 8cx7JxygErrYLPGFkWLB1Z+LfC75AVVu/nyf2qKLNYKoWWde3fOTZyFIM3D85Q3k65Pt /sbWwrvK8nmJ6cUba4dKG+FcRHNr2FqsALwm6yC18D3P1d3/rzrAbKy3PqXGpYPYsaF4 T2CQ== X-Gm-Message-State: ACgBeo1gASAdc1yCP8UwtDJIH7TtJr09HD5nuJaYjvQ0mwHg7WL8jqTI lYeqq3jXB54SGTda2Dxd1pLOMZsgcfmkj1Gl X-Google-Smtp-Source: AA6agR7mn7drF1EX2q4r/AkoijuG3HHN/cteIHySlcLvuIMP2LmzdJ2/oScnnsK0c73eQDrZHFoxqg== X-Received: by 2002:a05:6870:585:b0:11c:43a:5bb5 with SMTP id m5-20020a056870058500b0011c043a5bb5mr16256938oap.89.1663618579427; Mon, 19 Sep 2022 13:16:19 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:c266:6474:c804:752d:521c? ([2804:1b3:a7c1:c266:6474:c804:752d:521c]) by smtp.gmail.com with ESMTPSA id w201-20020aca30d2000000b00342fc99c5cbsm13258501oiw.54.2022.09.19.13.16.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Sep 2022 13:16:19 -0700 (PDT) Message-ID: <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org> Date: Mon, 19 Sep 2022 17:16:16 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions. Content-Language: en-US To: "dengjianbo@loongson.cn" Cc: joseph , carlos , libc-alpha , "i.swmail" , xuchenghua , caiyinyu References: <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn> <2022091910031722091613@loongson.cn> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <2022091910031722091613@loongson.cn> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 18/09/22 23:03, dengjianbo@loongson.cn wrote: > Hi Adhemerval, > > Please kindly see the following link for the test results of comparing with > new generic version. > https://sourceware.org/pipermail/libc-alpha/2022-September/142016.html > > Comparing with the previous patch, we further optimized strchr and > strchrnul, 4 instructions was reduced before the loop. Do you have any breakdown if either loop unrolling or missing string-fzi.h/ string-fza.h is what is making difference in string routines? Checking on last iteration [1], it seems that strchr is issuing 2 loads on each loop iteration and using bit-manipulation instruction that I am not sure compiler could emit with generic code. Maybe we can tune the generic implementation to get similar performance, as Richard has done for alpha, hppa, sh, and powerpc? I am asking because from the brief description of the algorithm, the general idea is essentially what my generic code aims to do (mask-off initial bytes, use word-aligned load and vectorized compares, extract final bytes), and I am hoping that architecture would provide string-fz{i,a}.h to get better code generation instead of pushing for more and more hand-write assembly routines. [1] https://patchwork.sourceware.org/project/glibc/patch/20220916071642.2822131-2-dengjianbo@loongson.cn/ > > Best regards, > Deng jianbo > From: Adhemerval Zanella Netto > Date: Fri, 2 Sep 2022 09:27:33 -0300 > To: Joseph Myers , Carlos O'Donell > CC:caiyinyu , libc-alpha@sourceware.org, i.swmail@xen0n.name, xuchenghua@loongson.cn > Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions. > > On 15/08/22 17:46, Joseph Myers wrote: > On Mon, 15 Aug 2022, Carlos O'Donell via Libc-alpha wrote: > > On 8/15/22 04:57, caiyinyu wrote: > Tested on LoongArch machine: gcc 13.0.0, Linux kernel 5.19.0 rc2, > binutils branch master 2eb132bdfb9. > > Could you please post microbenchmark results for these changes? > > How much faster are they from the generic versions? > > Note that so far we haven't merged the improved generic string functions that were posted a while back (https://sourceware.org/legacy-ml/libc-alpha/2018-01/msg00318.html is the version linked from https://sourceware.org/glibc/wiki/NewPorts - don't know if it's the most recent version). So even if assembly versions are better than the current generic string functions, they might not be better than improved generic versions with architecture-specific implementations of the headers to provide per-architecture tuning. > > > And it seems that some of this newer implementations does what my patch > basically does. The memmove is an improvement since the generic code we > have does a internal libcall to memcpy (which some architecture optimizes > it by implementing memcpy and memmove on some TU to just do a branch instead of a function call). > > I will rebase and resend my improved generic string, I think it would > yield very similar numbers to the str* assembly implementations proposed.