public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Xi Ruoyao <xry111@xry111.site>,
	"dengjianbo@loongson.cn" <dengjianbo@loongson.cn>
Cc: libc-alpha <libc-alpha@sourceware.org>,
	caiyinyu <caiyinyu@loongson.cn>,
	xuchenghua <xuchenghua@loongson.cn>,
	"i.swmail" <i.swmail@xen0n.name>,
	joseph <joseph@codesourcery.com>
Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions.
Date: Thu, 22 Sep 2022 15:05:24 -0300	[thread overview]
Message-ID: <1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org> (raw)
In-Reply-To: <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site>



On 20/09/22 06:54, Xi Ruoyao wrote:
> On Mon, 2022-09-19 at 17:16 -0300, Adhemerval Zanella Netto via Libc-
> alpha wrote:
>> Do you have any breakdown if either loop unrolling or missing string-fzi.h/
>> string-fza.h is what is making difference in string routines? 
> 
> It looks like there are some difficulties... LoongArch does not have a
> dedicated instruction for finding a zero byte among the 8 bytes in a
> register (I guess the LoongArch SIMD eXtension will provide such an
> instruction, but the full LSX manual is not published yet and some
> LoongArch processors may lack LSX).  So the assembly code submitted by
> dengjianbo relies on a register to cache the bit pattern
> 0x0101010101010101.  We can't just rematerialize it (with 3
> instructions) in has_zero or has_eq etc. or the performance will be
> likely horribly bad.  

The 0x0101010101010101 is already created on find_zero_low (lsb), so creating
it again on another static inline function should provide enough information
to compiler to optimize the materialization to avoid doing it twice. So
maybe adding a LoongArch specific index_first_zero_eq should be suffice.

Maybe we can parametrize strchr with an extra function to do what the final
step does:

    op_t found = index_first_zero_eq (word, repeated_c);
    if (extractbyte (word, found) == c)
      return (char *) (word_ptr) + found;
    return NULL;

So LoongArch can reimplement it with a better strategy as well.

The idea is this generic implementation is exactly to find the missing spots
where C code could not produce the best instruction and parametrize in way
that allows each architecture to reimplement in the best way.

> 
>> Checking on last iteration [1], it seems that strchr is issuing 2 loads
>> on each loop iteration and using bit-manipulation instruction that I am
>> not sure compiler could emit with generic code. Maybe we can tune the
>> generic implementation to get similar performance, as Richard has done
>> for alpha, hppa, sh, and powerpc?
>>
>> I am asking because from the brief description of the algorithm, the
>> general idea is essentially what my generic code aims to do (mask-off
>> initial bytes, use word-aligned load and vectorized compares, extract
>> final bytes), and I am hoping that architecture would provide 
>> string-fz{i,a}.h to get better code generation instead of pushing
>> for more and more hand-write assembly routines.
> 

  reply	other threads:[~2022-09-22 18:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  8:57 caiyinyu
2022-08-15  8:57 ` [PATCH 1/2] LoongArch: Add optimized string functions: str{chr, chrnul, cmp, ncmp} caiyinyu
2022-08-15  8:57 ` [PATCH 2/2] LoongArch: Add optimized function: memmove caiyinyu
2022-08-15 14:02 ` [PATCH 0/2] LoongArch: Add optimized functions Carlos O'Donell
2022-08-15 20:46   ` Joseph Myers
     [not found]     ` <ccc3c93d-07d0-ea9b-562c-aeaec8914f20@loongson.cn>
2022-09-02  9:05       ` Fwd: " dengjianbo
2022-09-02 12:27     ` Adhemerval Zanella Netto
     [not found]       ` <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn>
2022-09-19  2:03         ` dengjianbo
2022-09-19 20:16           ` Adhemerval Zanella Netto
2022-09-20  9:54             ` Xi Ruoyao
2022-09-22 18:05               ` Adhemerval Zanella Netto [this message]
2022-09-26 13:49                 ` Xi Ruoyao
2022-09-28 14:22                   ` Richard Henderson
2022-09-28 16:42                     ` Xi Ruoyao
2022-09-28 19:18                       ` Richard Henderson
2022-10-10  1:39                         ` Lulu Cheng
2022-09-29  3:00                       ` Lulu Cheng
2022-09-29 11:45                   ` Adhemerval Zanella Netto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=caiyinyu@loongson.cn \
    --cc=dengjianbo@loongson.cn \
    --cc=i.swmail@xen0n.name \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    --cc=xry111@xry111.site \
    --cc=xuchenghua@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).