Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Xi Ruoyao <xry111@xry111.site>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	dengjianbo <dengjianbo@loongson.cn>,
	caiyinyu <caiyinyu@loongson.cn>,
	 libc-alpha@sourceware.org
Cc: xuchenghua@loongson.cn, huangpei@loongson.cn
Subject: Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen
Date: Thu, 03 Aug 2023 22:53:36 +0800	[thread overview]
Message-ID: <29863c0d1eb285c1eb62336c0e1592e317f89349.camel@xry111.site> (raw)
In-Reply-To: <2aed2087-c44e-8fed-83d1-5e60343c8f47@linaro.org>

On Thu, 2023-08-03 at 10:48 -0300, Adhemerval Zanella Netto wrote:
> On 03/08/23 10:27, dengjianbo wrote:
> > On 2023-08-02 20:59, Adhemerval Zanella Netto wrote:
> > > > > > > On 2023-08-02 10:31, Adhemerval Zanella Netto wrote:
> > > > > > > +#if IS_IN (libc)
> > > > > > > +# define STRLEN __strlen_aligned
> > > > > > > +#else
> > > > > > > +# define STRLEN strlen
> > > > > > > +#endif
> > > > > > Is this really an improvement over the generic implementation? It seems to
> > > > > > use a quite similar strategy.
> > > > Comparing with the code generated by compiler, the assembly code does an 16bytes loop
> > > > unrolling, and handles ascii data and non-ascii data separately which could take less
> > > > instructions to calculate the length of  ascii data. besides, the assembly code using
> > > > fewer instructions to start the loop. I think the performance improvement benefits from
> > > > this. Please kindly check bench result also from:
> > > > https://github.com/jiadengx/glibc_test/blob/main/strlen/bench-strlen.out
> > > From the summarized results [1], it seems that the initial start to mask
> > > off unaligned inputs are slight better.  The __strlen_aligned onl seems
> > > better to sizes larger than 32 (the 16 lenght results seems strange).
> > > Maybe you coult improve shift_find/find_zero_all/index_first on loongarch.
> > > 
> > > Does it improve by explicit instructing compiler to unroll the loop?
> > As you know, the assembly versions of strlen uses the same strategy to
> > calculate string length, if assembly code only calculate 8 bytes in the
> > loop and don't separate ascii and non-ascii data, the code of loop and
> > loop end part should be the same as the compiler generated code base on
> > generic strlen. Loongarch doesn't provide instructions like alpha
> > cmpbge, so there is no much optimizations could be done on
> > find_zero_all/index_first/has_zero except we can remove some BIG_ENDIAN
> > codes.

Removing them will not make any difference because the compiler will
optimized the BIG_ENDIAN paths away.

> > Refer to the latest test results in the chart: The assembly
> > implementation vs. generic strlen implementation(compiled by using
> > CFLAGS-strlen.c += -funroll-all-loops --param
> > max-variable-expandsions-in-unroller=2) the performance
> > improvement of the assembly implementation is evident(30% ~ 40%),
> > especially in cases when the length is greater than 64 bytes.
> > Please kindly see the results via:
> > https://github.com/jiadengx/glibc_test/blob/main/strlen2/bench1/generic_strlen_with_loop_unrolling.png
> 
> So maybe use the generic implementation plus the compiler flags to loop
> unrolling instead of asm optimization?

This is strange... I remember I'd attempted to add #pragma GCC unroll
for the main loop of strlen and I observed no performance gain on my
Loongson-3A5000-HV, at all.  Maybe a different test environment
(hardware, compiler version, or something)?

> > > > > > This implementation fails to assembler with binutils 2.40.0.20230525:
> > > > > > ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S: Assembler messages:
> > > > > > ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:30: Error: no match insn: vld  $vr0,$r4,0
> > > > > > ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:31: Error: no match insn: vld  $vr1,$r4,16
> > > > > > 
> > > > Sorry, it's my mistake for the wrong version of binutils. Could you please try the latest release
> > > > version 2.41?
> > > Although it should work, it is unexpected that depending of the assembler used
> > > some optimized routines are not enabled. 
> > 
> > In patch v2, an new configuration variable has been added to control
> > whether the LASX/LSX will be compiled according to assembler support
> > LASX/LSX or not, so it can be compiled with old versions of binutils.
> 
> Yes I am aware and this seems odd, albeit not really wrong.  It means that
> you will get less code coverage and optimizations depending of the used 
> binutils. 
> 
> I would advise to follow what other architecture did to provide arch-specific 
> optimization, which is either setup a minimum gcc/binutils version (for 
> instance aarch64 libmvec), or encode the instructions in a binutils neutral
> mode (as the powerpc implementation I pointed out).

Hmm, this policy seems different from $OTHER_PROJECTS.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

next prev parent reply	other threads:[~2023-08-03 14:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01  7:09 [PATCH 0/2] Add ifunc support and " dengjianbo
2023-08-01  7:09 ` [PATCH 1/2] LoongArch: Redefine macro LEAF/ENTRY dengjianbo
2023-08-01  7:09 ` [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen dengjianbo
2023-08-01 14:31   ` Adhemerval Zanella Netto
2023-08-02  1:25     ` caiyinyu
2023-08-02 12:25       ` dengjianbo
2023-08-02 12:59         ` Adhemerval Zanella Netto
2023-08-03 13:27           ` dengjianbo
2023-08-03 13:48             ` Adhemerval Zanella Netto
2023-08-03 14:53               ` Xi Ruoyao [this message]
2023-08-03 14:59                 ` Xi Ruoyao
2023-08-03 16:29                   ` Adhemerval Zanella Netto
2023-08-04  1:50                 ` caiyinyu
2023-08-04 10:00               ` dengjianbo
2023-08-01 14:44   ` Xi Ruoyao
2023-08-02 12:47     ` dengjianbo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29863c0d1eb285c1eb62336c0e1592e317f89349.camel@xry111.site \
    --to=xry111@xry111.site \
    --cc=adhemerval.zanella@linaro.org \
    --cc=caiyinyu@loongson.cn \
    --cc=dengjianbo@loongson.cn \
    --cc=huangpei@loongson.cn \
    --cc=libc-alpha@sourceware.org \
    --cc=xuchenghua@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).