From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Xi Ruoyao <xry111@xry111.site>,
"dengjianbo@loongson.cn" <dengjianbo@loongson.cn>
Cc: libc-alpha <libc-alpha@sourceware.org>,
caiyinyu <caiyinyu@loongson.cn>,
xuchenghua <xuchenghua@loongson.cn>,
"i.swmail" <i.swmail@xen0n.name>,
joseph <joseph@codesourcery.com>
Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions.
Date: Thu, 22 Sep 2022 15:05:24 -0300 [thread overview]
Message-ID: <1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org> (raw)
In-Reply-To: <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site>
On 20/09/22 06:54, Xi Ruoyao wrote:
> On Mon, 2022-09-19 at 17:16 -0300, Adhemerval Zanella Netto via Libc-
> alpha wrote:
>> Do you have any breakdown if either loop unrolling or missing string-fzi.h/
>> string-fza.h is what is making difference in string routines?
>
> It looks like there are some difficulties... LoongArch does not have a
> dedicated instruction for finding a zero byte among the 8 bytes in a
> register (I guess the LoongArch SIMD eXtension will provide such an
> instruction, but the full LSX manual is not published yet and some
> LoongArch processors may lack LSX). So the assembly code submitted by
> dengjianbo relies on a register to cache the bit pattern
> 0x0101010101010101. We can't just rematerialize it (with 3
> instructions) in has_zero or has_eq etc. or the performance will be
> likely horribly bad.
The 0x0101010101010101 is already created on find_zero_low (lsb), so creating
it again on another static inline function should provide enough information
to compiler to optimize the materialization to avoid doing it twice. So
maybe adding a LoongArch specific index_first_zero_eq should be suffice.
Maybe we can parametrize strchr with an extra function to do what the final
step does:
op_t found = index_first_zero_eq (word, repeated_c);
if (extractbyte (word, found) == c)
return (char *) (word_ptr) + found;
return NULL;
So LoongArch can reimplement it with a better strategy as well.
The idea is this generic implementation is exactly to find the missing spots
where C code could not produce the best instruction and parametrize in way
that allows each architecture to reimplement in the best way.
>
>> Checking on last iteration [1], it seems that strchr is issuing 2 loads
>> on each loop iteration and using bit-manipulation instruction that I am
>> not sure compiler could emit with generic code. Maybe we can tune the
>> generic implementation to get similar performance, as Richard has done
>> for alpha, hppa, sh, and powerpc?
>>
>> I am asking because from the brief description of the algorithm, the
>> general idea is essentially what my generic code aims to do (mask-off
>> initial bytes, use word-aligned load and vectorized compares, extract
>> final bytes), and I am hoping that architecture would provide
>> string-fz{i,a}.h to get better code generation instead of pushing
>> for more and more hand-write assembly routines.
>
next prev parent reply other threads:[~2022-09-22 18:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-15 8:57 caiyinyu
2022-08-15 8:57 ` [PATCH 1/2] LoongArch: Add optimized string functions: str{chr, chrnul, cmp, ncmp} caiyinyu
2022-08-15 8:57 ` [PATCH 2/2] LoongArch: Add optimized function: memmove caiyinyu
2022-08-15 14:02 ` [PATCH 0/2] LoongArch: Add optimized functions Carlos O'Donell
2022-08-15 20:46 ` Joseph Myers
[not found] ` <ccc3c93d-07d0-ea9b-562c-aeaec8914f20@loongson.cn>
2022-09-02 9:05 ` Fwd: " dengjianbo
2022-09-02 12:27 ` Adhemerval Zanella Netto
[not found] ` <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn>
2022-09-19 2:03 ` dengjianbo
2022-09-19 20:16 ` Adhemerval Zanella Netto
2022-09-20 9:54 ` Xi Ruoyao
2022-09-22 18:05 ` Adhemerval Zanella Netto [this message]
2022-09-26 13:49 ` Xi Ruoyao
2022-09-28 14:22 ` Richard Henderson
2022-09-28 16:42 ` Xi Ruoyao
2022-09-28 19:18 ` Richard Henderson
2022-10-10 1:39 ` Lulu Cheng
2022-09-29 3:00 ` Lulu Cheng
2022-09-29 11:45 ` Adhemerval Zanella Netto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=caiyinyu@loongson.cn \
--cc=dengjianbo@loongson.cn \
--cc=i.swmail@xen0n.name \
--cc=joseph@codesourcery.com \
--cc=libc-alpha@sourceware.org \
--cc=xry111@xry111.site \
--cc=xuchenghua@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).