From: Xi Ruoyao <xry111@xry111.site>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
dengjianbo <dengjianbo@loongson.cn>,
caiyinyu <caiyinyu@loongson.cn>,
libc-alpha@sourceware.org
Cc: xuchenghua@loongson.cn, huangpei@loongson.cn
Subject: Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen
Date: Thu, 03 Aug 2023 22:53:36 +0800 [thread overview]
Message-ID: <29863c0d1eb285c1eb62336c0e1592e317f89349.camel@xry111.site> (raw)
In-Reply-To: <2aed2087-c44e-8fed-83d1-5e60343c8f47@linaro.org>
On Thu, 2023-08-03 at 10:48 -0300, Adhemerval Zanella Netto wrote:
> On 03/08/23 10:27, dengjianbo wrote:
> > On 2023-08-02 20:59, Adhemerval Zanella Netto wrote:
> > > > > > > On 2023-08-02 10:31, Adhemerval Zanella Netto wrote:
> > > > > > > +#if IS_IN (libc)
> > > > > > > +# define STRLEN __strlen_aligned
> > > > > > > +#else
> > > > > > > +# define STRLEN strlen
> > > > > > > +#endif
> > > > > > Is this really an improvement over the generic implementation? It seems to
> > > > > > use a quite similar strategy.
> > > > Comparing with the code generated by compiler, the assembly code does an 16bytes loop
> > > > unrolling, and handles ascii data and non-ascii data separately which could take less
> > > > instructions to calculate the length of ascii data. besides, the assembly code using
> > > > fewer instructions to start the loop. I think the performance improvement benefits from
> > > > this. Please kindly check bench result also from:
> > > > https://github.com/jiadengx/glibc_test/blob/main/strlen/bench-strlen.out
> > > From the summarized results [1], it seems that the initial start to mask
> > > off unaligned inputs are slight better. The __strlen_aligned onl seems
> > > better to sizes larger than 32 (the 16 lenght results seems strange).
> > > Maybe you coult improve shift_find/find_zero_all/index_first on loongarch.
> > >
> > > Does it improve by explicit instructing compiler to unroll the loop?
> > As you know, the assembly versions of strlen uses the same strategy to
> > calculate string length, if assembly code only calculate 8 bytes in the
> > loop and don't separate ascii and non-ascii data, the code of loop and
> > loop end part should be the same as the compiler generated code base on
> > generic strlen. Loongarch doesn't provide instructions like alpha
> > cmpbge, so there is no much optimizations could be done on
> > find_zero_all/index_first/has_zero except we can remove some BIG_ENDIAN
> > codes.
Removing them will not make any difference because the compiler will
optimized the BIG_ENDIAN paths away.
> > Refer to the latest test results in the chart: The assembly
> > implementation vs. generic strlen implementation(compiled by using
> > CFLAGS-strlen.c += -funroll-all-loops --param
> > max-variable-expandsions-in-unroller=2) the performance
> > improvement of the assembly implementation is evident(30% ~ 40%),
> > especially in cases when the length is greater than 64 bytes.
> > Please kindly see the results via:
> > https://github.com/jiadengx/glibc_test/blob/main/strlen2/bench1/generic_strlen_with_loop_unrolling.png
>
> So maybe use the generic implementation plus the compiler flags to loop
> unrolling instead of asm optimization?
This is strange... I remember I'd attempted to add #pragma GCC unroll
for the main loop of strlen and I observed no performance gain on my
Loongson-3A5000-HV, at all. Maybe a different test environment
(hardware, compiler version, or something)?
> > > > > > This implementation fails to assembler with binutils 2.40.0.20230525:
> > > > > > ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S: Assembler messages:
> > > > > > ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:30: Error: no match insn: vld $vr0,$r4,0
> > > > > > ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:31: Error: no match insn: vld $vr1,$r4,16
> > > > > >
> > > > Sorry, it's my mistake for the wrong version of binutils. Could you please try the latest release
> > > > version 2.41?
> > > Although it should work, it is unexpected that depending of the assembler used
> > > some optimized routines are not enabled.
> >
> > In patch v2, an new configuration variable has been added to control
> > whether the LASX/LSX will be compiled according to assembler support
> > LASX/LSX or not, so it can be compiled with old versions of binutils.
>
> Yes I am aware and this seems odd, albeit not really wrong. It means that
> you will get less code coverage and optimizations depending of the used
> binutils.
>
> I would advise to follow what other architecture did to provide arch-specific
> optimization, which is either setup a minimum gcc/binutils version (for
> instance aarch64 libmvec), or encode the instructions in a binutils neutral
> mode (as the powerpc implementation I pointed out).
Hmm, this policy seems different from $OTHER_PROJECTS.
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
next prev parent reply other threads:[~2023-08-03 14:53 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 7:09 [PATCH 0/2] Add ifunc support and " dengjianbo
2023-08-01 7:09 ` [PATCH 1/2] LoongArch: Redefine macro LEAF/ENTRY dengjianbo
2023-08-01 7:09 ` [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen dengjianbo
2023-08-01 14:31 ` Adhemerval Zanella Netto
2023-08-02 1:25 ` caiyinyu
2023-08-02 12:25 ` dengjianbo
2023-08-02 12:59 ` Adhemerval Zanella Netto
2023-08-03 13:27 ` dengjianbo
2023-08-03 13:48 ` Adhemerval Zanella Netto
2023-08-03 14:53 ` Xi Ruoyao [this message]
2023-08-03 14:59 ` Xi Ruoyao
2023-08-03 16:29 ` Adhemerval Zanella Netto
2023-08-04 1:50 ` caiyinyu
2023-08-04 10:00 ` dengjianbo
2023-08-01 14:44 ` Xi Ruoyao
2023-08-02 12:47 ` dengjianbo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29863c0d1eb285c1eb62336c0e1592e317f89349.camel@xry111.site \
--to=xry111@xry111.site \
--cc=adhemerval.zanella@linaro.org \
--cc=caiyinyu@loongson.cn \
--cc=dengjianbo@loongson.cn \
--cc=huangpei@loongson.cn \
--cc=libc-alpha@sourceware.org \
--cc=xuchenghua@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).