From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 4C9973858CDA for ; Mon, 26 Sep 2022 13:49:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4C9973858CDA Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1664200147; bh=07NAvOnqQeQbheJ7WVhWbK6zwoOHH2Nqno31EncJbI8=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=DRBQ2pqc9VQfc7d64GUw2ACDQtcog1GjQ7moVxn5swNn8b1nBaiMzjTmKPCVxENWt zhacdUbsTCsFcctutA4PotEXugYBHDlLPYQesITDorqOqRT8TqYgqhrxKj7ph2xrIw eNRWen+WIikiNCLXZca/G0yANPZd9ULsXxUst6ig= Received: from localhost.localdomain (xry111.site [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 26AA365DBF; Mon, 26 Sep 2022 09:49:05 -0400 (EDT) Message-ID: <8411c465e01de9608633f8b1fd2d82d3ef16f001.camel@xry111.site> Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions. From: Xi Ruoyao To: Adhemerval Zanella Netto , "dengjianbo@loongson.cn" Cc: libc-alpha , caiyinyu , xuchenghua , "i.swmail" , joseph Date: Mon, 26 Sep 2022 21:49:04 +0800 In-Reply-To: <1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org> References: <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn> <2022091910031722091613@loongson.cn> <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org> <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site> <1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org> Content-Type: multipart/mixed; boundary="=-7KO6v6b3K/Xh9EbWuuRZ" User-Agent: Evolution 3.46.0 MIME-Version: 1.0 X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_SUSPICIOUS_NTLD,KAM_SHORT,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP,T_PDS_OTHER_BAD_TLD autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-7KO6v6b3K/Xh9EbWuuRZ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Adhemerval and Jianbo, I've customized string-fzi.h and string-maskoff.h for LoongArch (see attachment). With them on top of Adhermerval's v5 "Improve generic string routines" patch and GCC & Binutils trunk, the benchmark result seems comparable with the assembly version for strchr, strcmp, and strchrnul. By the way I've tried to unroll the loop in strchr manually, but then the compiler produced some bad thing (moving words from a register to another with no reason) and the result is slower. I've not really plotted the the result, just took a quick look with my eyes. You can try the bench with my headers in sysdeps/loongarch. >=20 --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University --=-7KO6v6b3K/Xh9EbWuuRZ Content-Disposition: attachment; filename="string-maskoff.h" Content-Transfer-Encoding: base64 Content-Type: text/x-chdr; name="string-maskoff.h"; charset="UTF-8" LyogTWFzayBvZmYgYml0cy4gIExvb25nQXJjaCB2ZXJzaW9uLgogICBDb3B5cmlnaHQgKEMpIDIw MjIgRnJlZSBTb2Z0d2FyZSBGb3VuZGF0aW9uLCBJbmMuCiAgIFRoaXMgZmlsZSBpcyBwYXJ0IG9m IHRoZSBHTlUgQyBMaWJyYXJ5LgoKICAgVGhlIEdOVSBDIExpYnJhcnkgaXMgZnJlZSBzb2Z0d2Fy ZTsgeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yCiAgIG1vZGlmeSBpdCB1bmRlciB0aGUg dGVybXMgb2YgdGhlIEdOVSBMZXNzZXIgR2VuZXJhbCBQdWJsaWMKICAgTGljZW5zZSBhcyBwdWJs aXNoZWQgYnkgdGhlIEZyZWUgU29mdHdhcmUgRm91bmRhdGlvbjsgZWl0aGVyCiAgIHZlcnNpb24g Mi4xIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFueSBsYXRlciB2ZXJzaW9u LgoKICAgVGhlIEdOVSBDIExpYnJhcnkgaXMgZGlzdHJpYnV0ZWQgaW4gdGhlIGhvcGUgdGhhdCBp dCB3aWxsIGJlIHVzZWZ1bCwKICAgYnV0IFdJVEhPVVQgQU5ZIFdBUlJBTlRZOyB3aXRob3V0IGV2 ZW4gdGhlIGltcGxpZWQgd2FycmFudHkgb2YKICAgTUVSQ0hBTlRBQklMSVRZIG9yIEZJVE5FU1Mg Rk9SIEEgUEFSVElDVUxBUiBQVVJQT1NFLiAgU2VlIHRoZSBHTlUKICAgTGVzc2VyIEdlbmVyYWwg UHVibGljIExpY2Vuc2UgZm9yIG1vcmUgZGV0YWlscy4KCiAgIFlvdSBzaG91bGQgaGF2ZSByZWNl aXZlZCBhIGNvcHkgb2YgdGhlIEdOVSBMZXNzZXIgR2VuZXJhbCBQdWJsaWMKICAgTGljZW5zZSBh bG9uZyB3aXRoIHRoZSBHTlUgQyBMaWJyYXJ5OyBpZiBub3QsIHNlZQogICA8aHR0cDovL3d3dy5n bnUub3JnL2xpY2Vuc2VzLz4uICAqLwoKI2lmbmRlZiBfU1RSSU5HX01BU0tPRkZfSAojZGVmaW5l IF9TVFJJTkdfTUFTS09GRl9IIDEKCiNpbmNsdWRlIDxlbmRpYW4uaD4KI2luY2x1ZGUgPGxpbWl0 cy5oPgojaW5jbHVkZSA8c3RkaW50Lmg+CiNpbmNsdWRlIDxzdHJpbmctb3B0eXBlLmg+CgovKiBQ cm92aWRlIGEgbWFzayBiYXNlZCBvbiB0aGUgcG9pbnRlciBhbGlnbm1lbnQgdGhhdCBzZXRzIHVw IG5vbi16ZXJvCiAgIGJ5dGVzIGJlZm9yZSB0aGUgYmVnaW5uaW5nIG9mIHRoZSB3b3JkLiAgSXQg aXMgdXNlZCB0byBtYXNrIG9mZgogICB1bmRlc2lyYWJsZSBiaXRzIGZyb20gYW4gYWxpZ25lZCBy ZWFkIGZyb20gYW4gdW5hbGlnbmVkIHBvaW50ZXIuCiAgIEZvciBpbnN0YW5jZSwgb24gYSA2NCBi aXRzIG1hY2hpbmUgd2l0aCBhIHBvaW50ZXIgYWxpZ25tZW50IG9mCiAgIDMgdGhlIGZ1bmN0aW9u IHJldHVybnMgMHgwMDAwMDAwMDAwZmZmZmZmIGZvciBMRSBhbmQgMHhmZmZmZmYwMDAwMDAwMDAw CiAgIChtZWFuaW5nIHRvIG1hc2sgb2ZmIHRoZSBpbml0aWFsIDMgYnl0ZXMpLiAgKi8Kc3RhdGlj IGlubGluZSBvcF90CmNyZWF0ZV9tYXNrICh1aW50cHRyX3QgaSkKewogIGkgPSBpICUgc2l6ZW9m IChvcF90KTsKICByZXR1cm4gfigoKG9wX3QpLTEpIDw8IChpICogQ0hBUl9CSVQpKTsKfQoKLyog U2V0dXAgYW4gd29yZCB3aXRoIGVhY2ggYnl0ZSBiZWluZyBjX2luLiAgRm9yIGluc3RhbmNlLCBv biBhIDY0IGJpdHMKICAgbWFjaGluZSB3aXRoIGlucHV0IGFzIDB4Y2UgdGhlIGZ1bmN0aW9ucyBy ZXR1cm5zIDB4Y2VjZWNlY2VjZWNlY2VjZS4gICovCnN0YXRpYyBpbmxpbmUgb3BfdApyZXBlYXRf Ynl0ZXMgKHVuc2lnbmVkIGNoYXIgY19pbikKewogIG9wX3QgciA9IGNfaW4gKiAweDAxMDEwMTAx OwoKICBfU3RhdGljX2Fzc2VydCAoc2l6ZW9mIChvcF90KSA9PSA0IHx8IHNpemVvZiAob3BfdCkg PT0gOCwKCQkgICJ1bnN1cHBvcnRlZCBvcF90IHNpemUiKTsKCiAgaWYgKHNpemVvZiAob3BfdCkg PT0gOCkKICAgIGFzbSAoImJzdHJpbnMuZFx0JTAsICUwLCA2MywgMzIiIDogIityIiAocikpOwoK ICByZXR1cm4gcjsKfQoKLyogQmFzZWQgb24gbWFzayBjcmVhdGVkIGJ5ICdjcmVhdGVfbWFzaycs IG1hc2sgb2ZmIHRoZSBoaWdoIGJpdCBvZiBlYWNoCiAgIGJ5dGUgaW4gdGhlIG1hc2suICBJdCBp cyB1c2VkIHRvIG1hc2sgb2ZmIHVuZGVzaXJhYmxlIGJpdHMgZnJvbSBhbgogICBhbGlnbmVkIHJl YWQgZnJvbSBhbiB1bmFsaWduZWQgcG9pbnRlciwgYW5kIGFsc28gdGFraW5nIGNhcmUgdG8gYXZv aWQKICAgbWF0Y2ggcG9zc2libGUgYnl0ZXMgbWVhbnQgdG8gYmUgbWF0Y2hlZC4gIEZvciBpbnN0 YW5jZSwgb24gYSA2NCBiaXRzCiAgIG1hY2hpbmUgd2l0aCBhIG1hc2sgY3JlYXRlZCBmcm9tIGEg cG9pbnRlciB3aXRoIGFuIGFsaWdubWVudCBvZiAzCiAgICgweDAwMDAwMDAwMDBmZmZmZmYpIHRo ZSBmdW5jdGlvbiByZXR1cm5zIDB4N2Y3ZjdmMDAwMDAwMDAwMCBmb3IgQkUKICAgYW5kIDB4MDAw MDAwMDAwMDdmN2Y3ZiBmb3IgTEUuICAqLwpzdGF0aWMgaW5saW5lIG9wX3QKaGlnaGJpdF9tYXNr IChvcF90IG0pCnsKICByZXR1cm4gbSAmIHJlcGVhdF9ieXRlcyAoMHg3Zik7Cn0KCi8qIFJldHVy biB0aGUgYWRkcmVzcyBvZiB0aGUgb3BfdCB3b3JkIGNvbnRhaW5pbmcgdGhlIGFkZHJlc3MgUC4g IEZvcgogICBpbnN0YW5jZSBvbiBhZGRyZXNzIDB4MDAxMTIyMzM0NDU1NjY3NyBhbmQgb3BfdCB3 aXRoIHNpemUgb2YgOCwKICAgaXQgcmV0dXJucyAweDAwMTEyMjMzNDQ1NTY2NzAuICAqLwpzdGF0 aWMgaW5saW5lIG9wX3QgKgp3b3JkX2NvbnRhaW5pbmcgKGNoYXIgY29uc3QgKnApCnsKICBfU3Rh dGljX2Fzc2VydCAoc2l6ZW9mIChvcF90KSA9PSA0IHx8IHNpemVvZiAob3BfdCkgPT0gOCwKCQkg ICJ1bnN1cHBvcnRlZCBvcF90IHNpemUiKTsKCiAgaWYgKHNpemVvZiAob3BfdCkgPT0gOCkKICAg IGFzbSAoImJzdHJpbnMuZFx0JTAsICR6ZXJvLCAyLCAwIiA6ICIrciIgKHApKTsKICBlbHNlCiAg ICBhc20gKCJic3RyaW5zLmRcdCUwLCAkemVybywgMSwgMCIgOiAiK3IiIChwKSk7CiAgcmV0dXJu IChvcF90ICopIHA7Cn0KCiNlbmRpZiAvKiBfU1RSSU5HX01BU0tPRkZfSCAgKi8K --=-7KO6v6b3K/Xh9EbWuuRZ Content-Disposition: attachment; filename="string-fzi.h" Content-Transfer-Encoding: base64 Content-Type: text/x-chdr; name="string-fzi.h"; charset="UTF-8" LyogWmVybyBieXRlIGRldGVjdGlvbjsgaW5kZXhlcy4gIExvb25nQXJjaCB2ZXJzaW9uLgogICBD b3B5cmlnaHQgKEMpIDIwMjIgRnJlZSBTb2Z0d2FyZSBGb3VuZGF0aW9uLCBJbmMuCiAgIFRoaXMg ZmlsZSBpcyBwYXJ0IG9mIHRoZSBHTlUgQyBMaWJyYXJ5LgoKICAgVGhlIEdOVSBDIExpYnJhcnkg aXMgZnJlZSBzb2Z0d2FyZTsgeW91IGNhbiByZWRpc3RyaWJ1dGUgaXQgYW5kL29yCiAgIG1vZGlm eSBpdCB1bmRlciB0aGUgdGVybXMgb2YgdGhlIEdOVSBMZXNzZXIgR2VuZXJhbCBQdWJsaWMKICAg TGljZW5zZSBhcyBwdWJsaXNoZWQgYnkgdGhlIEZyZWUgU29mdHdhcmUgRm91bmRhdGlvbjsgZWl0 aGVyCiAgIHZlcnNpb24gMi4xIG9mIHRoZSBMaWNlbnNlLCBvciAoYXQgeW91ciBvcHRpb24pIGFu eSBsYXRlciB2ZXJzaW9uLgoKICAgVGhlIEdOVSBDIExpYnJhcnkgaXMgZGlzdHJpYnV0ZWQgaW4g dGhlIGhvcGUgdGhhdCBpdCB3aWxsIGJlIHVzZWZ1bCwKICAgYnV0IFdJVEhPVVQgQU5ZIFdBUlJB TlRZOyB3aXRob3V0IGV2ZW4gdGhlIGltcGxpZWQgd2FycmFudHkgb2YKICAgTUVSQ0hBTlRBQklM SVRZIG9yIEZJVE5FU1MgRk9SIEEgUEFSVElDVUxBUiBQVVJQT1NFLiAgU2VlIHRoZSBHTlUKICAg TGVzc2VyIEdlbmVyYWwgUHVibGljIExpY2Vuc2UgZm9yIG1vcmUgZGV0YWlscy4KCiAgIFlvdSBz aG91bGQgaGF2ZSByZWNlaXZlZCBhIGNvcHkgb2YgdGhlIEdOVSBMZXNzZXIgR2VuZXJhbCBQdWJs aWMKICAgTGljZW5zZSBhbG9uZyB3aXRoIHRoZSBHTlUgQyBMaWJyYXJ5OyBpZiBub3QsIHNlZQog ICA8aHR0cDovL3d3dy5nbnUub3JnL2xpY2Vuc2VzLz4uICAqLwoKI2lmbmRlZiBfU1RSSU5HX0Za SV9ICiNkZWZpbmUgX1NUUklOR19GWklfSCAxCgojaW5jbHVkZSA8bGltaXRzLmg+CiNpbmNsdWRl IDxlbmRpYW4uaD4KI2luY2x1ZGUgPHN0cmluZy1memEuaD4KI2luY2x1ZGUgPGdtcC5oPgojaW5j bHVkZSA8c3RkbGliL2dtcC1pbXBsLmg+CiNpbmNsdWRlIDxzdGRsaWIvbG9uZ2xvbmcuaD4KCi8q IEEgc3Vicm91dGluZSBmb3IgdGhlIGluZGV4X3plcm8gZnVuY3Rpb25zLiAgR2l2ZW4gYSB0ZXN0 IHdvcmQgQywgcmV0dXJuCiAgIHRoZSAobWVtb3J5IG9yZGVyKSBpbmRleCBvZiB0aGUgZmlyc3Qg Ynl0ZSAoaW4gbWVtb3J5IG9yZGVyKSB0aGF0IGlzCiAgIG5vbi16ZXJvLiAgKi8Kc3RhdGljIGlu bGluZSB1bnNpZ25lZCBpbnQKaW5kZXhfZmlyc3RfIChvcF90IGMpCnsKICBfU3RhdGljX2Fzc2Vy dCAoc2l6ZW9mIChvcF90KSA9PSBzaXplb2YgKGxvbmcpLCAib3BfdCBtdXN0IGJlIGxvbmciKTsK CiAgcmV0dXJuIF9fYnVpbHRpbl9jdHpsIChjKSAvIENIQVJfQklUOwp9CgovKiBTaW1pbGFybHks IGJ1dCByZXR1cm4gdGhlIChtZW1vcnkgb3JkZXIpIGluZGV4IG9mIHRoZSBsYXN0IGJ5dGUgdGhh dCBpcwogICBub24temVyby4gICovCnN0YXRpYyBpbmxpbmUgdW5zaWduZWQgaW50CmluZGV4X2xh c3RfIChvcF90IGMpCnsKICBfU3RhdGljX2Fzc2VydCAoc2l6ZW9mIChvcF90KSA9PSBzaXplb2Yg KGxvbmcpLCAib3BfdCBtdXN0IGJlIGxvbmciKTsKCiAgcmV0dXJuIHNpemVvZiAob3BfdCkgLSAx IC0gKF9fYnVpbHRpbl9jbHpsIChjKSAvIENIQVJfQklUKTsKfQoKLyogR2l2ZW4gYSB3b3JkIFgg dGhhdCBpcyBrbm93biB0byBjb250YWluIGEgemVybyBieXRlLCByZXR1cm4gdGhlIGluZGV4IG9m CiAgIHRoZSBmaXJzdCBzdWNoIHdpdGhpbiB0aGUgd29yZCBpbiBtZW1vcnkgb3JkZXIuICAqLwpz dGF0aWMgaW5saW5lIHVuc2lnbmVkIGludAppbmRleF9maXJzdF96ZXJvIChvcF90IHgpCnsKICB4 ID0gZmluZF96ZXJvX2xvdyAoeCk7CiAgcmV0dXJuIGluZGV4X2ZpcnN0XyAoeCk7Cn0KCi8qIFNp bWlsYXJseSwgYnV0IHBlcmZvcm0gdGhlIHNlYXJjaCBmb3IgYnl0ZSBlcXVhbGl0eSBiZXR3ZWVu IFgxIGFuZCBYMi4gICovCnN0YXRpYyBpbmxpbmUgdW5zaWduZWQgaW50CmluZGV4X2ZpcnN0X2Vx IChvcF90IHgxLCBvcF90IHgyKQp7CiAgeDEgPSBmaW5kX2VxX2xvdyAoeDEsIHgyKTsKICByZXR1 cm4gaW5kZXhfZmlyc3RfICh4MSk7Cn0KCi8qIFNpbWlsYXJseSwgYnV0IHBlcmZvcm0gdGhlIHNl YXJjaCBmb3IgemVybyB3aXRoaW4gWDEgb3IgZXF1YWxpdHkgYmV0d2VlbgogICBYMSBhbmQgWDIu ICAqLwpzdGF0aWMgaW5saW5lIHVuc2lnbmVkIGludAppbmRleF9maXJzdF96ZXJvX2VxIChvcF90 IHgxLCBvcF90IHgyKQp7CiAgeDEgPSBmaW5kX3plcm9fZXFfbG93ICh4MSwgeDIpOwogIHJldHVy biBpbmRleF9maXJzdF8gKHgxKTsKfQoKLyogU2ltaWxhcmx5LCBidXQgcGVyZm9ybSB0aGUgc2Vh cmNoIGZvciB6ZXJvIHdpdGhpbiBYMSBvciBpbmVxdWFsaXR5IGJldHdlZW4KICAgWDEgYW5kIFgy LiAgKi8Kc3RhdGljIGlubGluZSB1bnNpZ25lZCBpbnQKaW5kZXhfZmlyc3RfemVyb19uZSAob3Bf dCB4MSwgb3BfdCB4MikKewogIHgxID0gZmluZF96ZXJvX25lX2xvdyAoeDEsIHgyKTsKICByZXR1 cm4gaW5kZXhfZmlyc3RfICh4MSk7Cn0KCi8qIFNpbWlsYXJseSwgYnV0IHNlYXJjaCBmb3IgdGhl IGxhc3QgemVybyB3aXRoaW4gWC4gICovCnN0YXRpYyBpbmxpbmUgdW5zaWduZWQgaW50CmluZGV4 X2xhc3RfemVybyAob3BfdCB4KQp7CiAgeCA9IGZpbmRfemVyb19hbGwgKHgpOwogIHJldHVybiBp bmRleF9sYXN0XyAoeCk7Cn0KCnN0YXRpYyBpbmxpbmUgdW5zaWduZWQgaW50CmluZGV4X2xhc3Rf ZXEgKG9wX3QgeDEsIG9wX3QgeDIpCnsKICByZXR1cm4gaW5kZXhfbGFzdF96ZXJvICh4MSBeIHgy KTsKfQoKI2VuZGlmIC8qIFNUUklOR19GWklfSCAqLwo= --=-7KO6v6b3K/Xh9EbWuuRZ--