From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [IPv6:2001:470:683e::1]) by sourceware.org (Postfix) with ESMTPS id E55DB3858D38 for ; Tue, 20 Sep 2022 09:54:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E55DB3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1663667672; bh=3VVytKVyEGG4qs7fKQaYYGPtSxdYH3tZhn5Bi9ghpkI=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Qx6oT+kkFsKYLJB6KSS6BkuNfG3/c/LuOHgmWhzu2uqtQ3Bo7OyqGLpj/wMdGV+qt xzNFQSBaSiwxmaRs+QRCn92oCeL73n/oxcdb7ZpTJe8fStt6av4X8FMIt7B0m2iiR7 SbA12piYLixq5AfqJYHDmLR4wd7vwGZzQaN1QuEw= Received: from localhost.localdomain (xry111.site [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 5332365F1A; Tue, 20 Sep 2022 05:54:31 -0400 (EDT) Message-ID: <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site> Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions. From: Xi Ruoyao To: Adhemerval Zanella Netto , "dengjianbo@loongson.cn" Cc: libc-alpha , caiyinyu , xuchenghua , "i.swmail" , joseph Date: Tue, 20 Sep 2022 17:54:29 +0800 In-Reply-To: <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org> References: <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn> <2022091910031722091613@loongson.cn> <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.0 MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_SUSPICIOUS_NTLD,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP,T_PDS_OTHER_BAD_TLD autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 2022-09-19 at 17:16 -0300, Adhemerval Zanella Netto via Libc- alpha wrote: > Do you have any breakdown if either loop unrolling or missing string-fzi.= h/ > string-fza.h is what is making difference in string routines?=20 It looks like there are some difficulties... LoongArch does not have a dedicated instruction for finding a zero byte among the 8 bytes in a register (I guess the LoongArch SIMD eXtension will provide such an instruction, but the full LSX manual is not published yet and some LoongArch processors may lack LSX). So the assembly code submitted by dengjianbo relies on a register to cache the bit pattern 0x0101010101010101. We can't just rematerialize it (with 3 instructions) in has_zero or has_eq etc. or the performance will be likely horribly bad. =20 > Checking on last iteration [1], it seems that strchr is issuing 2 loads > on each loop iteration and using bit-manipulation instruction that I am > not sure compiler could emit with generic code. Maybe we can tune the > generic implementation to get similar performance, as Richard has done > for alpha, hppa, sh, and powerpc? >=20 > I am asking because from the brief description of the algorithm, the > general idea is essentially what my generic code aims to do (mask-off > initial bytes, use word-aligned load and vectorized compares, extract > final bytes), and I am hoping that architecture would provide=20 > string-fz{i,a}.h to get better code generation instead of pushing > for more and more hand-write assembly routines. --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University